OpenZWave / qt-openzwave

QT5 Wrapper for OpenZWave
GNU Lesser General Public License v3.0
105 stars 30 forks source link

Allinone supervisor fails to restart ozwdaemon after MQTT server connection is lost and restored #120

Closed kpine closed 4 years ago

kpine commented 4 years ago

Using the allinone image (build 150), I shutdown the MQTT server. As expected, ozwdaemon also shuts down, however attempts to reconnect fail pretty badly. There are several crashes reported, and eventually the ozwdaemon stops trying to connect. Bringing back the MQTT server at this time is too late.

Is there a retry limit? The watchdog log continuously prints these errors at about the time it appears that ozwdaemon stops.

RESULT 4
FAILREADY

The container remains running and I'm forced to manually stop the ozwdaemon container and restart it. If there is a retry limit, I would expect the container to exit at least after the retries expire.

Logs attached.

Related to #20.

kpine commented 4 years ago

I got a set of logs with the standalone image, using a restart policy of on-failure. There are other crashes logged, but eventually after I restarted the MQTT server ozwd was able to connect. But the last container restart was a little weird, it took nearly a minute from the shutdown messages until the container restarted. Not sure if that was ozwd hanging like I saw with the allinone, or docker doing something funny.

2020-07-04T07:51:20.595430085Z [20200704 0:51:20.595 PDT] [ozw.daemon] [info]: QT Version:  5.12.5
2020-07-04T07:52:12.942099129Z Executing: /usr/local/bin/ozwdaemon -s /dev/ttyUSB0 --config-dir /opt/ozw/config --user-dir /opt/ozw/user --mqtt-server dev-mqtt --mqtt-port 1883 --stop-on-failure --mqtt-instance 2

ozwd_standalone_log.zip

kpine commented 4 years ago

OK, tried again with the allinone image, I forgot about the docker logs. It looks like the ozwd retries are too much and it stops retrying. Then, the supervisor watchdog itself fails. Once that happens, there's nothing to do but restart manually.

supervisor.log

Fishwaldo commented 4 years ago

Ok - I've "migrated" to the S6 supervisor framework now, as the watchdog was just too fragile. Lets see whats broken now :)