PubInv / krake

A wireless alarm device which makes loud noises and flashes lights to alert a human
GNU Affero General Public License v3.0
0 stars 2 forks source link

MQTT Connection Lost & Not Recover, Processing PMD and GPAD_API V0.17 #82

Closed ForrestErickson closed 1 week ago

ForrestErickson commented 1 week ago

Describe the bug

A clear and concise description of what the bug is. Short Summary of the expected behavior: MQTT Connection to recover after waking the PC from sleep. Short Summary of the buggy behavior: The Processing PMD lost connection and when restarted still not successfully publishing to nor receiving keepalive subscriptions from GPAD_API V0.17 devices. The four Homework2 GPAD_API V0.17 devices appear to be locked up (not running program).

Detailed Description PMD

Was running the Processing PMD on Lee's PC overnight. The PC went to sleep Restart PC The Processing PMD console reported connection lost (As it has every other day before) Today 20241117 restarting the program did not restor connection.

The draw window starts up grey but then goes green when the MQTT connection to broker is made. In the draw windows the background is green because no packets are received from the broker. image

There remains no messages to which the PMD is subscriving. image

Using Wireshark Lee found the outgoing MQTT topic with messages to the broker for the five USA devices such as KRAKE_20240421_LEB5_ALM image

However there is no traffic back from the broker with expected MQTT topics such as KRAKE_20240421_LEB5_ACK

Detailed Description GPAD_API on Homework2

There are four Homework2 assemblies running in Maryville TN The three Homework2 assemblies with LCD are showing alarm level 5. The single device with out LCD has not alarm LEDs lit. None of the heart beat LEDs are blinking. None of the keep alive message LEDs are blinking. Device USA1 has no alarm LEDs lit. Heart beat and keep alive LED are lit. Device USA2 has all alarm LEDs lit. Heart beat and keep alive LED are lit. Device USA3 has no alarm LEDs lit. Heart beat is lit and keep alive LED is off. Device USA4 has no alarm LEDs lit. Heart beat and keep alive LED are lit. This device has no LCD.

Lee opened the serial port connection to USA3 and saw a single message for a reconnection attempt to the broker image

Capture Network Status on Lee's Network at morning of failure

Did an arp -a to see recently active IP addresses image

Ran a PING to broadcast IP addess to make the arp list big. image

Results of arp -a after the broadcast ping. image

ForrestErickson commented 1 week ago

Change the COM port on the Arduino IDE for COM7 which is USA4 I found the same message about Attempting MQTT connection. image

Text of message:

Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2����������������������������������������������������������������led, rc=-2Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2����������������

Reset USA4

MQTT connection still fails image

Note that the WiFi failed to connect but then the MQTT connection attempt was made which is doomed to fail. Device status: Heart beat LED stuck ON. The keep alive LED and all alarm LEDs are off.

ForrestErickson commented 1 week ago

About 8:20 Lee power cycled the router and now all four devices are connecting to the WiFI.

HOWEVER what I have learned is that when the WiFi connection can not be made the GPAD_API V0.17 gets hung up trying to make an impossible MQTT connection.

RobertLRead commented 1 week ago

I believe this is fixed in verison 0.19. I institued a fixed number of retries.

ForrestErickson commented 6 days ago

CC: @RobertLRead @nk25719

Notes on Regression testing of V0.19

About 20241119 0810 EDT, Lee programed four devices with V0.19 From the PMD_PROCESSING Sketch

WiFi interruption.

At 8:15 Lee removed power from the VRX WiFi router.

The PMD_PROCESSING sketch shows the last on line messages at "20241119_081529 Msg_recd: 3C61053DF08C_ACK - online, RSSI:-61.00" image

The serial port on USA 1, shows the retry connection attempts. image

The serial port shows that even thought the WiFI:STA (station connection) failed, the firmware is attempting an MQTT connection. This is another kind of error. Will make new bug report.

08:20:00.271 -> Connecting to WiFi: VRX 08:20:00.271 -> E (669915) wifi:sta is connecting, return error 08:20:00.271 -> Failed to connect WiFi. 08:20:00.271 -> Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2failed to reconnect! 08:20:09.584 -> Publish RSSI: 0.00 08:20:09.584 -> Device connected at IPadderss: 0.0.0.0

WiFi Restore.

At 8:25 Lee restored power from the VRX WiFi router.

The PMD_PROCESSING started again receiving MQTT at 082627

Serial port capture from USA1 shows first connected at about "08:26:09.584 -> Publish RSSI: -63.00" However, the sequence of messages below so not show an expected clear connection to first WiFi and then a following MQTT broker connection.

08:25:40.237 -> Connecting to WiFi: VRX 08:25:40.237 -> E (1009932) wifi:sta is connecting, return error 08:25:40.237 -> Failed to connect WiFi. 08:25:40.237 -> Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2failed to reconnect! 08:25:49.591 -> Publish RSSI: -61.00 08:25:49.591 -> Device connected at IPadderss: 0.0.0.0 08:25:59.560 -> Publish RSSI: 0.00 08:25:59.614 -> Device connected at IPadderss: 0.0.0.0 08:26:00.229 -> 08:26:00.229 -> Connecting to WiFi: VRX 08:26:00.229 -> Failed to connect WiFi. 08:26:00.275 -> Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2Attempting MQTT connection...failed, rc=-2failed to reconnect! 08:26:09.584 -> Publish RSSI: -63.00 08:26:09.584 -> Device connected at IPadderss: 0.0.0.0 08:26:19.560 -> Publish RSSI: -59.00 08:26:19.611 -> Device connected at IPadderss: 192.168.1.141 08:26:20.261 -> Attempting MQTT connection...success! 08:26:20.696 -> connected! 08:26:29.560 -> Publish RSSI: -63.00 08:26:29.611 -> Device connected at IPadderss: 192.168.1.141 08:26:39.564 -> Publish RSSI: -62.00 08:26:39.564 -> Device connected at IPadderss: 192.168.1.141 08:26:49.560 -> Publish RSSI: -63.00 08:26:49.992 -> Device connected at IPadderss: 192.168.1.141 08:26:53.544 -> Topic arrived [3C61053DF08C_ALM] Received MQTT Msg. 08:26:53.544 -> Command: a 08:26:53.544 -> 2

Note the exact time that the WiFi router finished booting and presents a signal from the time of the application of power is not known at this time. Recovery to a network connection of the DUT(s) was less than two minuets.

Test Summary Results,

WiFi and MQTT Recovers from WiFi interruption. Recovery is as quick as could be expected within 1-2 minutes.