OpenZWave / qt-openzwave

QT5 Wrapper for OpenZWave
GNU Lesser General Public License v3.0
105 stars 30 forks source link

supervisord-watchdog (exit status 1; not expected) #114

Closed trebolcinco closed 4 years ago

trebolcinco commented 4 years ago

This morning I started receiving this WARN: "supervisord-watchdog (exit status 1; not expected)". It was preceded by: INFO " supervisord-watchdog: event was rejected". I can access the admin panel but when I try to Open the Remote Daemon it says: "Connection Refused". I tried rebooting and restarting docker, neither helped.

Fishwaldo commented 4 years ago

What version are you running?

trebolcinco commented 4 years ago

The latest docker all in one. It was working fine yesterday then this morning it stopped. Tried restart and reboot.

On Jun 22, 2020, at 4:06 PM, Justin Hammond notifications@github.com wrote:

What version are you running?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/OpenZWave/qt-openzwave/issues/114#issuecomment-647769200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKYR65GLPOIOMXYAZ3IMTDRX7BUDANCNFSM4OETTZ2Q.

mike240se commented 4 years ago

I installed "latest" today and it was 3 days behind, I now am running 145 and this error/issue went away.

FreelancerJay commented 4 years ago

I'm having the same warning in my logs too. I've only just tried to set this up with docker-compose last night, and it's all I get. Here is my log from when I start the docker from the terminal.

Attaching to ozwd ozwd | 2020-06-28 09:36:50,223 INFO Set uid to user 0 succeeded ozwd | 2020-06-28 09:36:50,231 INFO supervisord started with pid 1 ozwd | 2020-06-28 09:36:51,243 INFO spawned: 'supervisord-watchdog' with pid 9 ozwd | 2020-06-28 09:36:51,254 INFO spawned: 'ozwdaemon' with pid 10 ozwd | 2020-06-28 09:36:51,265 INFO spawned: 'ozw-admin' with pid 11 ozwd | 2020-06-28 09:36:51,277 INFO spawned: 'websockify' with pid 12 ozwd | 2020-06-28 09:36:51,372 INFO exited: ozwdaemon (exit status 128; not expected) ozwd | 2020-06-28 09:36:52,282 INFO success: supervisord-watchdog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ozwd | 2020-06-28 09:36:52,285 INFO success: ozw-admin entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ozwd | 2020-06-28 09:36:52,285 INFO success: websockify entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ozwd | 2020-06-28 09:36:53,332 INFO spawned: 'ozwdaemon' with pid 13 ozwd | 2020-06-28 09:36:53,374 INFO exited: ozwdaemon (exit status 128; not expected) ozwd | 2020-06-28 09:36:55,386 INFO spawned: 'ozwdaemon' with pid 14 ozwd | 2020-06-28 09:36:55,431 INFO exited: ozwdaemon (exit status 128; not expected) ozwd | 2020-06-28 09:36:58,755 INFO spawned: 'ozwdaemon' with pid 15 ozwd | 2020-06-28 09:36:58,802 INFO exited: ozwdaemon (exit status 128; not expected) ozwd | 2020-06-28 09:36:59,805 INFO gave up: ozwdaemon entered FATAL state, too many start retries too quickly ozwd | 2020-06-28 09:37:00,823 WARN supervisord-watchdog: event was rejected ozwd | 2020-06-28 09:37:00,920 INFO exited: supervisord-watchdog (exit status 1; not expected) ozwd | 2020-06-28 09:37:01,929 INFO spawned: 'supervisord-watchdog' with pid 17 ozwd | 2020-06-28 09:37:03,427 INFO success: supervisord-watchdog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ozwd | 2020-06-28 09:37:03,438 WARN supervisord-watchdog: event was rejected ozwd | 2020-06-28 09:37:03,537 INFO exited: supervisord-watchdog (exit status 1; not expected) ozwd | 2020-06-28 09:37:04,547 INFO spawned: 'supervisord-watchdog' with pid 19 ozwd | 2020-06-28 09:37:06,058 INFO success: supervisord-watchdog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

It seems to be starting everything, and after a couple seconds ozwdaemon crashes, is relaunched and crashes again, and repeats once more at which point docker gives up because of how quickly it was looping. supervisord-watchdog seems to fail to parse, or log, or whatever it is meant to do with that info, and crashes. It relaunches, and stays up long enough that the following crash doesn't lead to docker giving up on the relaunches, because those last 4 lines repeat every few seconds from then on until I quit it with control-c.

When I vnc in, I also can't access the remote panel like @trebolcinco, it also just gives me "Connection Refused". If I choose to "Start Local Client", I can control the Z-Wave Network find, and have successfully included a device this way, but of course nothing gets sent to my MQTT broker by this...

Running on a Raspberry Pi 3, fully up to date Raspbian (per apt-get update). HomeAssistant is on a second Raspberry Pi on the network, not on the same hardware. I pulled the docker image that was released a few hours ago too, using the allinone-latest tag, so I believe it is version 149.

@trebolcinco, when you launch the docker in the terminal, do you get the ozwdaemon process exiting as well before the supervisord-watchdog starts quiting on you too? If not, my issue is probably different and I'll open a separate issue

FreelancerJay commented 4 years ago

I've just updated the docker to the latest -allinone image, 154.

Is there anything I can do to get more information to help identify a cause for this?

bachya commented 4 years ago

I, too, am experiencing this stack trace with the allinone-amd64-153 and allinone-amd64-154 Docker images, as well. Incidentally, the allinone-build-153 and allinone-latest seem to start up fine.

kpine commented 4 years ago

If your architecture is amd64, then allinone-amd64-153 , allinone-build-153 and allinone-latest are the exact same image.

Providing all of log files would be more helpful.

bachya commented 4 years ago

@kpine I mentioned I'm getting the same stack trace as @FreelancerJay when viewing the logs of the container – where would I find application logs?

kpine commented 4 years ago

There is a logs directory, default location would be in the config directory.

kpine commented 4 years ago

Also, allinone builds > 150 are known to be broken (at least use of ozwadmin, what else?) because of changes in the QT5 version. A fix is in the works.

bachya commented 4 years ago

@kpine Here are my logs from a clean run of allinone-amd64-154:

ozwadmin.log websocket.log watchdog.log ozwdaemon.log

mike240se commented 4 years ago

Also, allinone builds > 150 are known to be broken (at least use of ozwadmin, what else?) because of changes in the QT5 version. A fix is in the works.

to anyone not aware: using the ozw-admin client on your desktop is a work around for the time being.

kpine commented 4 years ago

@bachya Your ozwdaemon.log is showing problems with MQTT.

[20200722 20:11:08.501 UTC] [ozw.mqtt.publisher] [debug]: MQTT State Change "Disconnected" 
[20200722 20:11:08.501 UTC] [ozw.mqtt.publisher] [warning]: Exiting on Failure 
[20200722 20:11:08.501 UTC] [ozw.mqtt.publisher] [warning]: MQTT Client Disconnnected 
[20200722 20:11:08.501 UTC] [ozw.mqtt.publisher] [warning]: MQTT Client Error "Transport Invalid" 

If ozwd disconnects from the broker it will restart, which seems to be happening here. Not sure if it will help or not, but you could try increasing the log levels, via a container environment variable:

QT_LOGGING_RULES="*.debug=true;qt.remoteobjects.io.debug=false;qt.remoteobjects.debug=false;qt.remoteobjects.models.debug=false;ozw.mqtt.qt2js.debug=false"
kpine commented 4 years ago

Here's what I see, in comparison:

[20200722 13:54:32.712 PDT] [qt.mqtt.connection] [debug]: bool QMqttConnection::sendControlConnect()
[20200722 13:54:32.712 PDT] [qt.mqtt.connection.verbose] [debug]: bool QMqttConnection::writePacketToTransport(const QMqttControlPacket&)  DataSize: 78
[20200722 13:54:32.712 PDT] [qt.mqtt.connection.verbose] [debug]: void QMqttConnection::transportReadReady()
[20200722 13:54:32.712 PDT] [qt.mqtt.connection.verbose] [debug]: Received CONNACK
[20200722 13:54:32.712 PDT] [qt.mqtt.connection.verbose] [debug]: Finalize CONNACK
[20200722 13:54:32.712 PDT] [ozw.mqtt.publisher] [debug]: MQTT State Change "Connected"
[20200722 13:54:32.712 PDT] [ozw.mqtt.publisher] [info]: MQTT Client Connected
bachya commented 4 years ago

@kpine What MQTT broker do you use? I use VerneMQ with several clients and it works great. That MQTT Client Error "Transport Invalid" is interesting... Will investigate.

EDIT: user error. 🤦 I wasn't using the name of my MQTT container as the hostname. Everything working now; I still think I'm encountering the QT5 issue over VNC (disconnects suddenly), so I'll wait for new versions to test again.

kpine commented 4 years ago

I'm using mosquitto.

EDIT: user error. 🤦 I wasn't using the name of my MQTT container as the hostname. Everything working now.

😆 Most of us have done that at some point... Some better logging of connection errors might be useful.

Fishwaldo commented 4 years ago

ozwdaemon is now logging as much info as we can get about connection failures, but its not a lot on the MQTT side unfortunately. I've added some additional logging when ozwdaemon exits about its failure and a message to "read the log files" about why it failed that hopefully will help some what.