Closed ammgws closed 7 years ago
Happened again even on the latest version. Second time it's happened, both times are when a secondary account accessed the page and ran a command (coincidence?). Serial manager thread seems to be the culprit as the temp logger stops logging.
gunicorn log:
[2017-03-14 17:50:58 +0900] [6195] [DEBUG] GET /
[2017-03-14 17:51:03 +0900] [6195] [DEBUG] GET /data/1
[2017-03-14 17:51:05 +0900] [6195] [DEBUG] GET /data/2
[2017-03-14 17:51:05 +0900] [6195] [DEBUG] GET /data/3
[2017-03-14 17:51:11 +0900] [6195] [DEBUG] GET /command
[2017-03-14 17:51:42 +0900] [6044] [CRITICAL] WORKER TIMEOUT (pid:6195)
[2017-03-14 17:51:42 +0900] [6195] [INFO] Worker exiting (pid: 6195)
[2017-03-14 17:51:44 +0900] [19647] [INFO] Booting worker with pid: 19647
[2017-03-14 17:51:44 +0900] [19647] [DEBUG] GET /command
[2017-03-14 17:52:15 +0900] [6044] [CRITICAL] WORKER TIMEOUT (pid:19647)
[2017-03-14 17:52:15 +0900] [19647] [INFO] Worker exiting (pid: 19647)
[2017-03-14 17:52:16 +0900] [19663] [INFO] Booting worker with pid: 19663
flask log:
2017-03-14 00:15:39.239 ammcon_frontend INFO Finished handling command request. (views.py:193)
2017-03-14 17:51:12.245 ammcon_frontend INFO Command "living1 night" received. Sending message: b'\xb4\x05' (views.py:141)
2017-03-14 17:51:12.288 ammcon_frontend INFO Connected to zeroMQ server. (views.py:149)
2017-03-14 17:51:45.121 ammcon_frontend DEBUG This is before first request (views.py:299)
2017-03-14 17:51:47.401 ammcon_frontend INFO Command "living2 night" received. Sending message: b'\xb5\x05' (views.py:141)
2017-03-14 17:51:47.440 ammcon_frontend INFO Connected to zeroMQ server. (views.py:149)
2017-03-14 17:51:47.678 ammcon_frontend DEBUG yarpliving2 night0 (views.py:159)
2017-03-14 17:51:47.683 ammcon_frontend DEBUG yarpliving2 night1 (views.py:159)
2017-03-14 17:51:47.688 ammcon_frontend DEBUG yarpliving2 night2 (views.py:159)
2017-03-14 19:23:01.202 ammcon_frontend DEBUG This is before first request (views.py:299)
serial manager:
2017-03-14 17:50:24,112 root DEBUG Received command in queue: b'\xd1' (serialmanager.py:56)
2017-03-14 17:50:24,120 root DEBUG yarptemp0 (templogger.py:49)
2017-03-14 17:50:24,124 root DEBUG yarptemp1 (templogger.py:49)
2017-03-14 17:50:24,134 root DEBUG Requesting temperature. (templogger.py:52)
2017-03-14 17:50:24,131 root INFO Command sent to microcontroller: ['0x3c', '0xd1', '0xa7', '0x3e'] (serialmanager.py:284)
2017-03-14 17:50:24,610 root DEBUG Raw response: ['0x3c', '0x6', '0xd1', '0xa7', '0x14', '0x32', '0x24', '0x0', '0x87', '0x3e'] (serialmanager.py:66)
2017-03-14 17:50:24,615 root DEBUG Destuffed : ['0x3c', '0x6', '0xd1', '0xa7', '0x14', '0x32', '0x24', '0x0', '0x87', '0x3e'] (serialmanager.py:74)
2017-03-14 17:50:24,623 root DEBUG Received : ['0x3c', '0x6', '0xd1', '0xa7', '0x14', '0x32', '0x24', '0x0', '0x87', '0x3e'] (templogger.py:54)
2017-03-14 17:51:12,310 root DEBUG Received command in queue: b'\xb4\x05' (serialmanager.py:56)
2017-03-14 17:51:12,321 root INFO Command sent to microcontroller: ['0x3c', '0xb4', '0x5', '0x7c', '0x3e'] (serialmanager.py:284)
2017-03-14 17:51:24,781 root DEBUG Requesting temperature. (templogger.py:52)
Able to reproduce in production but not development (aka virtual serial port):
living1 night
command through the web GUIGET https://****.com:*****/command?command=living1+night 502 (Bad Gateway)
Commands that cause system to fail (all other commands respond normally):
1. "living1 night" command
living1 night (b'\xB4\x05')
Command sent to microcontroller: ['0x3c', '0xb4', '0x5', '0x7c', '0x3e'] (serialmanager.py:284)
0x3CB4057C3E
but it does to 0x3CB4057C7C3E
. This is because it prematurely reaches the end flag byte (0x3e) as it thinks it is part of the response payload.Countermeasure
Command sent to microcontroller: ['0x3c', '0xb4', '0x5', '0x7c', '0x7c', '0x3e'] (serialmanager.py:286)
0x3CB4057C7C3E
, Response: 0x3C06B405FA743E
(valid command response - correct behavior)0x3CB4057C553E
, Response: 0x3C15B40511FF3E
(invalid CRC response - correct behavior)
→ FIXEDWhile the problem above is fixed, need to also check the following at a later time:
2. Aircon 'AC' commands
living AC off (b'\xAC\x00')
Command sent to microcontroller: ['0x3c', '0xac', '0x0', '0xa3', '0x3e'] (serialmanager.py:284)
living AC auto (b'\xAC\x01')
Command sent to microcontroller: ['0x3c', '0xac', '0x1', '0x44', '0x3e'] (serialmanager.py:284)
Plugged in ammcon hardware to dev PC and noted that it does not respond to 0x3CAC00A33E
, but does respond to 0x3CAC00553E
with invalid CRC response (changed CRC byte to random value for testing)
Reason is that the aircon commands were removed from the hardware code and the function wasn't updated to return an error/NAK or any other response in place of it. It responds to the command with the invalid CRC because it reaches a different code block.
/* Aircon commands */
case 0xAC:
/* removed for now */
break;
Countermeasure Removed the case statement from ammcon hardware code:
Sent: 0x3CAC00A33E
, Response: 0x3C15AC0098713E
(unknown command response - correct behavior)
Sent: 0x3CAC00553E
, Response: 0x3C15AC0011FF3E
(invalid CRC response - correct behavior)
→ FIXED
Fixed by ammcon commit 4ec8a8b869e4dc7479b36b3159bf41c66dbb7b57
response = socket.recv()
line.gunicorn log:
flask log:
serial manager log: