MQTT client timeout - Githubissues

eriklindqvist commented 4 years ago

I am running docker images ozwdaemon:latest and eclipse-mosquitto:latest as of yesterday on a Raspberry Pi together with Home Assistant 0.113.1. I have about 90 devices and over 500 entities.

As long as I don't really do anything, it's running pretty stable. However, using OZWAdmin-0.1.74, if I push the z-wave network a bit too far, such as healing and/or refreshing too many nodes at the same time, basically clogging the network with messages, the ozwdaemon mqtt client doesn't seem to be able to keep up. All entities in Home Assistant becomes unavailable, and I see the following in the mosquitto logs:

Client qt-openzwave-1 has exceeded timeout, disconnecting.

So I restart the ozwdaemon docker instance, and I see the folling in the Mosquitto logs:

  New connection from 172.18.0.2 on port 1883.
  Socket error on client <unknown>, disconnecting.
  New connection from 172.18.0.2 on port 1883.
  New client connected from 172.18.0.2 as qt-openzwave-1 (p2, c1, k60).

Then it works for just about under two minutes before it gets disconnected again:

  Client qt-openzwave-1 has exceeded timeout, disconnecting.

From what I understand (please, correct me if I'm wrong) that "k60"-part in the Mosquitto logs means "keepalive = 60", i.e. the MQTT client tells the broker when connecting that it will stay in touch with a ping message at least once every minute, and if that doesn't happen, the client will be disconnected.

I increased logging in mosquitto (by setting "log_type all" in mosquitto.conf) and also started ozwdaemon with -e QT_LOGGING_RULES="*.debug=false;ozw.mqtt.publisher.debug=true"

and I can see in the mosquitto logs

  ...
  Received PUBLISH from qt-openzwave-1 (d0, q0, r1, m0, 'OpenZWave/1/node/1/instance/1/commandclass/32/value/562949970722835/', ... (478 bytes))
  Sending PUBLISH to auto-3E2A0E60-FB05-A814-00F1-5DE1DEFD51A0 (d0, q0, r0, m0, 'OpenZWave/1/node/1/instance/1/commandclass/32/value/562949970722835/', ... (478 bytes))
  Sending PUBLISH to qt-openzwave-1 (d0, q0, r0, m0, 'OpenZWave/1/node/1/instance/1/commandclass/32/value/562949970722835/', ... (478 bytes))
  Received PINGREQ from auto-3E2A0E60-FB05-A814-00F1-5DE1DEFD51A0
  Sending PINGRESP to auto-3E2A0E60-FB05-A814-00F1-5DE1DEFD51A0
  Client qt-openzwave-1 has exceeded timeout, disconnecting.

while ozwdaemon continues to print out hundreds of rows such as

  ...
  [ozw.mqtt.publisher] [debug]: Publishing Event valueAdded: 562952802893846
  ...

for several minutes until it realizes that the connection is gone:

  [ozw.mqtt.publisher] [debug]: Publishing Event valueRefreshed: 562950595969074
  [ozw.mqtt.publisher] [debug]: Publishing Event valueRefreshed: 72057594680475696
  [ozw.mqtt.publisher] [debug]: MQTT State Change "Disconnected" 
  [ozw.mqtt.publisher] [warning]: Exiting on Failure
  [ozw.mqtt.publisher] [warning]: MQTT Client Disconnnected
  [ozw.mqtt.publisher] [warning]: MQTT Client Error "Transport Invalid"

The only way I can get it to stay up is to remove/rename the ozwcache_0xf7b52c8f.xml file and restart, but it doesn't feel like a good solution.

Any ideas on what's going on?

Fishwaldo commented 4 years ago

[ozw.mqtt.publisher] [debug]: Publishing Event valueAdded: 562952802893846

I'm gathering that the hundreds of these messages, the number is changing?

If so, yes, I have a idea. We are not yielding to allow the actual network processing to happen. If you can confirm the above, the fix should be simple.

eriklindqvist commented 4 years ago

Yes, exactly. The numbers are changing for every row. Also, it's not only valueAdded, it can be all sorts of different z-wave stuff, such as nodeGroupChanged, valueRefreshed etc.

Olen commented 4 years ago

Same issue here. It also happens sometimes on startup of the qt-ozw-contianer. Tried to adjust the keepalive_timeout in mosquitto.conf, but mosquitto did not like that, and would not start.

It seems like when the mqtt-disconnect happens, the ozwdaemon is stuck using 100% CPU. I thought it would exit (and restart the ccontainer?)

sirfooey commented 4 years ago

I am also experiencing this issue, occurring when the ozw container is starting up. Resulting in a client timeout and disconnection.

In the previous zw1.4, ozwlog shows I have a very chatty network so assuming this is causing too many messages, and ozw is delayed in sending a keepalive to mqtt.

sirfooey commented 4 years ago

As with OP, the only way for the ozwd to remain connected to mqtt (not getting timeout) is to trash the ozwcache file.

jlengq commented 4 years ago

Any progress on this one? I have a large network as well and can't get past the initialization step without this timeout and eventually shutdown of ozwdaemon.

1602753420: Client qt-openzwave-1 has exceeded timeout, disconnecting. @sirfooey I can't find any ozwcache file, where is it located?

sirfooey commented 4 years ago

@jlengq Check your OZW container volume location, it should be right there.

jlengq commented 4 years ago

Thanks, I found it! Does unfortunately not solve my problem ,

I have 100+ nodes and the interview process launched at startup seems to choke the MQTT network somehow, resulting in the timeout. Deleting the ozwcache only seems to start the whole process over again?

sirfooey commented 4 years ago

@jlengq yes, trashing the file starts the whole discovery process again (no loss of any node data); but sometimes that's what is needed to get it fully operational. Some people have reported better stability with Build 150 (docker pull openzwave/ozwdaemon:allinone-build-150), might be worth trying that as well.

abmantis commented 4 years ago

I've now moved from qt-openzwave to zwave2mqtt. I noticed that during some operations while the controller is waiting for replies and they take a long time to come (and timeout, usually), z2m also shows as "disconnected", but after a while it reconnects. This is probably what is making z2m more reliable: it reconnects to mqtt.

kpine commented 4 years ago

Has anyone tried increasing the MQTT client timeout in ozwd to see if that works around the problem? Maybe increasing the timeout would allow ozwd to finish whatever it's doing before disconnecting (assuming it's not in an infinite loop).

I think it should be as simple as adding a call to this->m_client->setKeepAlive(360); in the code below. That would set the timeout to 3 mins instead of the default 1 minute. Adjust as necessary or use preferably set via an environment variable.

https://github.com/OpenZWave/qt-openzwave/blob/89cc0d86c983101aacd89c780bae18bb3dffe9b4/qt-ozwdaemon/mqttpublisher.cpp#L54-L63

Not sure if there's any downside to increasing the time besides not reacting as quickly for real timeouts.

sirfooey commented 4 years ago

So far with limited testing, I can confirm with @kpine's suggestion, ozw is able to start up with a pre-existing ozwcache file, whereas in the past, I would 100% of the time get timeout disconnects until I deleted the ozwcache file.

Olen commented 4 years ago

Just added a PR for that. I still think there are better fixes to be made, but at least it will make the daemon start up.

brett19 commented 3 years ago

I too have encountered this issue. I have rebuild my entire HA setup around using Docker such that I can test if turning off logging and what not improves the situation (which it did, but it still fails to talk to MQTT sometimes).

I also took a look at how the ping/pong works and from what I can tell it's all built into qtmqtt using a timer, the only reason I can imagine that the timer wouldn't fire is if the event loop was blocked.

m3ki commented 3 years ago

I am having the same issue, 100+ nodes ozw daemon won't stay up.

renlor16 commented 3 years ago

I have just added additional devices to my zwave network. I am now having the same issue. ozw daemon goes offline during startup.

brett19 commented 3 years ago

So I spent an ungodly amount of time figuring out how to get a local debug build of ozwdaemon running on my MBP. I can't exactly reproduce the issue on my MBP (presumably because its too fast to trigger the issue), but what I do see is that the MQTT timer for pings is being invoked as expected (I also set up a 1s timer which triggers precisely on time). I also confirmed that blocking the event loop for a period of time definitely causes the MQTT timer not to be invoked. After digging into the code a bit, it looks like OZWNotification schedules it's events to be processed by the main thread using Qt::QueuedConnection, I have a suspicion that what's going on is that Open-Zwave is generating events so quickly that the Pi cannot keep up on the main thread. This causes the queue of queued signals to become saturated with queued events from OZW, which leads to the timer not being able to get in to fire. Something else that makes this seem likely is that I did some profiling and a huge amount of time is spent serializing events for MQTT, which is likely what is making each event take so long to be processed on the main thread. Assuming this is an accurate assessment, I see a couple of paths forward.

Use Qt::BlockingQueuedConnection to put some back-pressure onto OZW such that it doesn't saturate us with events during startup. This does require that OZW itself has internal back-pressuring mechanisms that can ensure that the thread that's communicating with the dongle doesn't saturate any internal OZW queues (this could cause the dongle to timeout).
Refactor the MQTT handling to automatically reconnect after a disconnect. This is certainly something that should occur anyways, but I think it's sort of hiding the underlying issue. Saturating the event loop with stuff can cause all sort's of havoc.
Move processing off the main thread. This is definitely the best option out of all of them. Having the event-processing for QTOZW and the network-handling for MQTT on their own threads would enable them to flow events between each-other without interfering with ordering or maintenance work that they need to be performing.
Accept something like Pull Request #185 which adjusts the MQTT Timeout to no longer triggered while the main thread is being saturated during startup (although it should probably be something like 10m). There isn't a whole lot of downside to doing this since taking longer to discover a 'lost client' is mostly irrelevant. Similar to 2 above, this is sort of hiding the issue, though it's quick and easy.

What do you think @Fishwaldo ?

Olen commented 3 years ago

I totally agree that #185 is just a workaround, and for me, yout option 3 seems like the best one. Unfortunately I don't have enough experience with C++ and QT to help.

Regarding option 2, there has been some discussion in another issue (could not find it here and now), and a problem is apparently that it is hard to keep track of the states (and what messages in either direction that might be lost) if you just do a MQTT-reconnect without restarting OZW at the same time.

I really hope fishwaldo is well, and will be back from his involuntary break soon...

m3ki commented 3 years ago

So I spent an ungodly amount of time figuring out how to get a local debug build of ozwdaemon running on my MBP. I can't exactly reproduce the issue on my MBP

This is where I am at right now :)

docker buildx build --platform linux/arm -f Docker/Dockerfile -t qt-ozw-allinone-timeout .
WARN[0000] invalid non-bool value for BUILDX_NO_DEFAULT_LOAD:
WARN[0000] No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 854.4s (16/44)

quickly that the Pi cannot keep up on the main thread. This causes the queue of queued signals to become saturated with queued events from OZW, which leads to the timer not being able to get in to fire.

So the classic CS problem of thread starvation?

Man I am rusty with C++, but having non-working lights is making me want to pick it back up.

psgcooldog commented 3 years ago

Until there's an actual fix for this issue, is there any way to take CPU resources from the core-openzwave docker container to artificially slow it down so that it doesn't overwhelm mqtt? Perhaps Portainer has something useful. I'll check this idea out tomorrow, because otherwise I'm just dead in the water.

m3ki commented 3 years ago

what I did on my end as a workaround is to install mqtt locally on my pi and then bridged it to my main mqtt server. It seems to fix the issue at least for me. I also tried recompiling ozwd to extend the timeout but that didn't quite solve the issue completely.

try this docker compose

version: '3'
services:
  mqtt:
    image: eclipse-mosquitto
    container_name: "mqtt-bridge"
    volumes:
      - ./mqtt:/mosquitto
      - ./mqtt/data:/mosquitto/data
      - ./mqtt/log:/mosquitto/log
    ports:
     - "1883:1883"
     - "9001:9001"
    restart: always
  ozwd:
    image: openzwave/ozwdaemon:latest
    container_name: "ozwd"
    depends_on:
      - "mqtt"
    security_opt:
      - seccomp:unconfined
    devices:
      - "/dev/serial/by-id/usb-xxx"
    volumes:
      - ./ozw:/opt/ozw/config
    ports:
      - "1983:1983"
      - "5901:5901"
      - "7800:7800"
    environment:
      MQTT_SERVER: "pi.local.net"
      MQTT_USERNAME: "[redacted]"
      MQTT_PASSWORD: "[redacted]"
      USB_PATH: "/dev/serial/by-id/usb-xxx"
      OZW_INSTANCE: "1"
      OZW_NETWORK_KEY: "[redacted]"
    restart: always

Here is mqtt config

persistence true
persistence_location /mosquitto/data/

log_dest file /mosquitto/log/mosquitto.log

password_file /mosquitto/config/passwd
allow_anonymous false

# External MQTT Broker
connection zpie01
address hassio.local.net
topic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as needed
remote_username [redacted]
remote_password [redacted]

brett19 commented 3 years ago

Depending on the MQTT server you have, the performance of your device and the size of the network, you could improve things with those kinds of changes. With my network of 58 nodes, even with a local MQTT and Pi4, it couldn't complete quickly enough.

I'll try to remember to push my docker image with Olen's workaround (which is perfectly reasonable to use in "production") tomorrow morning.

Cheers, Brett

On Tue., Dec. 1, 2020, 8:13 p.m. m3ki, notifications@github.com wrote:

what I did on my end as a workaround is to install mqtt locally on my pi and then bridged it to my main mqtt server. It seems to fix the issue at least for me. I also tried recompiling ozwd to extend the timeout but that didn't quite solve the issue completely.

try this docker compose

version: '3'services: mqtt: image: eclipse-mosquitto container_name: "mqtt-bridge" volumes:

./mqtt:/mosquitto

./mqtt/data:/mosquitto/data

./mqtt/log:/mosquitto/log ports:

"1883:1883"

"9001:9001" restart: always ozwd: image: openzwave/ozwdaemon:latest container_name: "ozwd" depends_on:

"mqtt" security_opt:

seccomp:unconfined devices:

"/dev/serial/by-id/usb-xxx" volumes:

./ozw:/opt/ozw/config ports:

"1983:1983"

"5901:5901"

"7800:7800" environment: MQTT_SERVER: "pi.local.net" MQTT_USERNAME: "[redacted]" MQTT_PASSWORD: "[redacted]" USB_PATH: "/dev/serial/by-id/usb-xxx" OZW_INSTANCE: "1" OZW_NETWORK_KEY: "[redacted]" restart: always

Here is mqtt config

persistence truepersistence_location /mosquitto/data/ log_dest file /mosquitto/log/mosquitto.log password_file /mosquitto/config/passwdallow_anonymous false

External MQTT Brokerconnection zpie01address hassio.m3ki.nettopic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as neededremote_username [redacted]remote_password [redacted]

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OpenZWave/qt-openzwave/issues/140#issuecomment-736977908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAML466463XNO4Q4OWZ3YMDSSW5FRANCNFSM4PLIQ25Q .

psgcooldog commented 3 years ago

It looks like adding " --cpu-shares 512 " might be a good start. Would any of you folks here know how I can get that into the command line that HA is using to start up the addon?

m3ki commented 3 years ago

Depending on the MQTT server you have, the performance of your device and the size of the network, you could improve things with those kinds of changes. With my network of 58 nodes, even with a local MQTT and Pi4, it couldn't complete quickly enough. I'll try to remember to push my docker image with Olen's workaround (which is perfectly reasonable to use in "production") tomorrow morning. Cheers, Brett … On Tue., Dec. 1, 2020, 8:13 p.m. m3ki, @.***> wrote: what I did on my end as a workaround is to install mqtt locally on my pi and then bridged it to my main mqtt server. It seems to fix the issue at least for me. I also tried recompiling ozwd to extend the timeout but that didn't quite solve the issue completely. try this docker compose version: '3'services: mqtt: image: eclipse-mosquitto container_name: "mqtt-bridge" volumes: - ./mqtt:/mosquitto - ./mqtt/data:/mosquitto/data - ./mqtt/log:/mosquitto/log ports: - "1883:1883" - "9001:9001" restart: always ozwd: image: openzwave/ozwdaemon:latest container_name: "ozwd" depends_on: - "mqtt" security_opt: - seccomp:unconfined devices: - "/dev/serial/by-id/usb-xxx" volumes: - ./ozw:/opt/ozw/config ports: - "1983:1983" - "5901:5901" - "7800:7800" environment: MQTT_SERVER: "pi.local.net" MQTT_USERNAME: "[redacted]" MQTT_PASSWORD: "[redacted]" USB_PATH: "/dev/serial/by-id/usb-xxx" OZW_INSTANCE: "1" OZW_NETWORK_KEY: "[redacted]" restart: always Here is mqtt config persistence truepersistence_location /mosquitto/data/ log_dest file /mosquitto/log/mosquitto.log password_file /mosquitto/config/passwdallow_anonymous false # External MQTT Brokerconnection zpie01address hassio.m3ki.nettopic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as neededremote_username [redacted]remote_password [redacted] — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#140 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAML466463XNO4Q4OWZ3YMDSSW5FRANCNFSM4PLIQ25Q .

For me I have 4 pis with 50-100 nodes or so and nothing would work. Even on the test ozwd instance where there were no nodes (what gives !?) OZWD would just disconnect with or without the keepalive modification in the code, iff the mqtt server was external to hassio. eclipse/mosquitto on docker on a beefy VM.

I have a feeling now that there is something going on with a network connection between the pi and the mqtt server. Things worked better if pi would connect to internal mqtt addon on hass server.

If you need to make sure startup completes you can wipeout ozwd cache on your pi that might help, just don't reset your zwave stick and all nodes will come back.

For me my network has been rock solid as of this morning with a setup I mention in my previous comment. PI (running ozwd and mqtt container) --->bridged to hassio mqtt.

m3ki commented 3 years ago

By network connection, i meant there is something going on with how ozwd handles traffic and/or congestion. Even with the keepalive increased network wouldn't be stable. At least in my case.

psgcooldog commented 3 years ago

I'm running a supervised install on Ubuntu 20.4 on an RPi4, with the Openzwave addon and the MQTT addon. I've been working to get everything back to normal after switching from the old all-in-one Zwave integration, and I thought I had finally got things right when this issue cropped up.

m3ki commented 3 years ago

I'm running a supervised install on Ubuntu 20.4 on an RPi4, with the Openzwave addon and the MQTT addon. I've been working to get everything back to normal after switching from the old all-in-one Zwave integration, and I thought I had finally got things right when this issue cropped up.

A friend did this build yesterday when we were trying to troubleshoot this issue you can try his docker https://hub.docker.com/r/firstof9/qt-ozwdaemon it has an increased timeout.

Keep in mind it kinda worked for me but I would still experience intermittent disconnects every hour or so or when network got busy.

I am now back to the original docker though, with mqtt running on same host and bridging mqtt to mqtt with hassio on a separate VM. So far so good.

You can also wipeout ozwd cache file of your ozwdaemon and see if you can get your network back up.

renzor16 commented 3 years ago

Adding several more devices to my network caused this problem for me last weekend. I'm around 60 devices now with more to add. I'm running using VirtualBox on an Intel NUC so it doesn't appear easy to make any temporary changes to work around this issue. Last night I switched to Zwave2MQTT and have all my devices connected this morning. I'll have to spend some time renaming devices/entities but at least this will get me going again.

psgcooldog commented 3 years ago

I really need a fix for this. The system fails too frequently as I am trying to add some door sensors and deadbolts. Deleting the cache file will let it restart, but many nodes lose their names, and some disappear. And it takes forever, anyway.

I tried Zwave2MQTT last week, and I switched when I ran into some issue (can't remember what it was now, lol). I may switch back and try it again. I think one issue was that it was clear from what I read that it was an orphaned project, and that the OpenZwave add-on with the OpenZwave integration was the road forward.

Is there any way to get the developers' attention quickly, or should I assume that this state of affairs will persist for a while?

brett19 commented 3 years ago

Hey @psgcooldog, I am not a developer of qt-openzwave, but I am a C++ developer. I spent a bunch of time looking into this issue and I am reasonably confident that the fixed timeout should be sufficient for most networks. I have deployed that fix to my own network and have not seen any issues since. If your running into issues still, even after building and deploying the pull requests fix, it's likely there is another issue at play. Cheers, Brett

m3ki commented 3 years ago

Hey @brett19 would this solution be better than simply setting an arbitrary timeout? this way anyone can adjust a timeout and tweak it as needed?

    QString mqtt_keep_alive = qgetenv("MQTT_KEEP_ALIVE");
    if (!mqtt_keep_alive.isEmpty()) {
        this->m_client->setKeepAlive(mqtt_keep_alive);
    }

I am having a hard time setting up a crosscompilation buildchain. How did you get it to work?

brett19 commented 3 years ago

Sorry for the delay, forgot to push this the other day. Here are some armhf (32-bit ARM) images including fix-185. https://hub.docker.com/r/brett19/ozwdaemon

psgcooldog commented 3 years ago

I decided to punt, and converted everything over to ZWave2MQTT. It was quite the time-consuming process, but this particular problem is no longer an issue for me.

karl-gustav commented 3 years ago

@psgcooldog I also jumped ship for z2m, but ozw is far superior when it comes to handling scene events from switches. It actually comes into HA as a scene event and not a regular state change. And z2m sends 4 state changes per button press, so you need to go deep into the event to figure out if it really is a new scene event.

tl;dr: would prefer to use ozw but had to switch to z2m because ozw can't handle more than ~20-25 devices before it breaks down...

renlor16 commented 3 years ago

@karl-gustav I was just trying to figure scenes out with z2m. I have Inovelli red dimmers that allow multi-tapping to create scenes. Had it working fine with ozw, but no luck with z2m. I'll probably give up for now and make the switch back to ozw when this bug is fixed. At least all my regular light automations are working again.

brett19 commented 3 years ago

Hey Everyone, can you confirm that you still have issues with the image I posted above containing fix 185.

If you can upload logs, that would help track down your specific issue beyond what we've already discovered.

Cheers, Brett

Olen commented 3 years ago

tl;dr: would prefer to use ozw but had to switch to z2m because ozw can't handle more than ~20-25 devices before it breaks down...

FWIW, I run ozw with 61 devices, and it has been running solid for 7 weeks. But I have only added a few new devices during that time, not removed any, and not done any network refreshes or other tricks. Restarting the container, on the other hand, is usually causing trouble. But as soon as it starts up, it seems pretty stable.

renlor16 commented 3 years ago

@brett19 I'm not that familiar with Docker. Is it possible to use Portainer to replace my current ozwdaemon with the one you created?

brett19 commented 3 years ago

@brett19 I'm not that familiar with Docker. Is it possible to use Portainer to replace my current ozwdaemon with the one you created?

You would need to shut down the container and spin up a new one with the same configuration but different image (as far as I know). I personally use docker-compose to make it easier to do that.

renlor16 commented 3 years ago

Maybe I’ll have some time over the holidays to figure that out. For now z2m is running well. The only thing I can’t figure out is scenes from my dimmers. Is @Fishwaldo the only person that can release a new version of the addon? Given that the HA roadmap seemed to be heading down the ozw path, it seems pretty risky if there is only one person that can release bug fixes/workarounds.

m3ki commented 3 years ago

A friend compiled "this" fix and added my fix to add MQTT_KEEP_ALIVE environment variable to change timeout as needed docker here https://hub.docker.com/r/firstof9/qt-ozwdaemon

Before this fix my setup would still restart if I did "Refresh node"

MQTT_KEEP_ALIVE: "360"

My config is here (keep in mind I am using a local mqtt on the pi that bridges to a main mqtt sever to make sure ozw doesn't restart if HASS instance is restarted

version: '3'
services:
  mqtt:
    image: eclipse-mosquitto
    container_name: "mqtt-bridge"
    volumes:
      - ./mqtt:/mosquitto
      - ./mqtt/data:/mosquitto/data
      - ./mqtt/log:/mosquitto/log
    ports:
     - "1883:1883"
     - "9001:9001"
    restart: always
  ozwd:
    #image: openzwave/ozwdaemon:allinone-latest
    image: firstof9/qt-ozwdaemon:latest
    container_name: "ozwd"
    depends_on:
      - "mqtt"
    security_opt:
      - seccomp:unconfined
    devices:
      - "/dev/serial/by-id/usb-0658_0200-if00"
    volumes:
      - ./ozw:/opt/ozw/config
    ports:
      - "1983:1983"
      - "5901:5901"
      - "7800:7800"
    environment:
      MQTT_SERVER: "localhost.mydomain.net"
      MQTT_USERNAME: "[redacted]"
      MQTT_PASSWORD: "[redacted]"
      MQTT_KEEP_ALIVE: "360"    <------ add keep alive like so
      USB_PATH: "/dev/serial/by-id/usb-0658_0200-if00"
      OZW_INSTANCE: "3"
      OZW_NETWORK_KEY: "[redacted]"
    restart: always

genome-prime commented 3 years ago

Hi, I have the same issues. Are there any solutions in sight? Or at least a temporary workaround for people like me, who run the official image on a raspberry pi?

I've tried switching to zwave2mqtt but I couldn't figure out how to add the devices and entities to HomeAssistant. Auto discover didn't work either. And I really don't want to have to do everything manually.

Is it correct that everyone who has a sufficiently large network using the OpenZWave Plugin is experiencing this issue?

Oh and also I'm quite new to all of this. So I'm JUST learing how to access certain files in the docker containers, getting OS SSH Access etc. This is how I've managed to delete the ozwcache file at least, so I don't have to install the whole thing from scratch. But everytime I "reset", start up the OpenZWave Plugin and let my network run for a bit it seems to randomly miss some entities.

Is there a way to restore the ozwcache from an old file without running into the timeout issue, maybe?

I'm desperate at this point.. so I'm thankful for any help :)

renlor16 commented 3 years ago

@genome-prime I was unable to figure out how to get the workaround installed on my setup. I ended up switching to z2m and was able to get auto discover to work. It did require that I re-enter all the entity names, which was a bit time consuming. So far it has been very stable. I have about 80 zwave devices. I had trouble getting the central scene figured out with z2m. I'm now looking at Node-Red with an MQTT node to grab the central scene info from there.

genome-prime commented 3 years ago

I'm giving up in frustration... I gave z2m another shot. This time my devices were auto detected and showed up in Home Assistant.

Unfortunately

some devices are missing random entities and even after "Refresh node info" not all entities are listed (this was occuring much less with OZW)
some devices (all of my thermostats) are always missing entities (like "mode") that were definitely showing up with OZW. The strange thing is I can see those entities in MQTT Explorer and in the config sections, but not anywhere else.
some devices are entirely the wrong type
some entities always report "unknown" like my Fibaro Motion Sensors (this was also definitely working with OZW)
...

I know there are customizations but I couldn't figure out where the config files are being stored and how to apply them and honestly I don't wanna have to customize anything, since OZW was detecting everything fine on its own (except my Fibaro Button which I can live without for now)

I know this might not be the right or best place for it but I need to get this off my chest: It would be much more fun if there were official, step-by-step, up-to-date Tutorials out there for setting up the different ways of a core feature (Z-Wave) of HomeAssistant.

I guess I should at least say some positive things too:

zwave2mqtt:

The no restart thing is nice
The UI is nice and not so buggy compared to OZW-Admin

OpenZWave:

It actually works! (for a limited time...)
Better compatibility out of the box

blhoward2 commented 3 years ago

Hey Everyone, can you confirm that you still have issues with the image I posted above containing fix 185.

If you can upload logs, that would help track down your specific issue beyond what we've already discovered.

Cheers, Brett

Your docker image seemed to fix the issue for me!

TheArcturian commented 3 years ago

I have a new and fresh installation of Home Assistant on Raspberry Pi per 25th December 2020 and have the same problem: "Client qt-openzwave-1 has exceeded timeout, disconnecting."

Should this bug even show up on a totally clean install?

TheArcturian commented 3 years ago

Seems like it is the Aeotec Z-Wave Gen5 stick that doesn't work on Raspberry Pi4. At least 3 (of 4) hardware versions of the stick: https://community.home-assistant.io/t/sticky-aeotec-z-stick-gen5-raspberry-pi4/218405/25

m3ki commented 3 years ago

Seems like it is the Aeotec Z-Wave Gen5 stick that doesn't work on Raspberry Pi4. At least 3 (of 4) hardware versions of the stick: https://community.home-assistant.io/t/sticky-aeotec-z-stick-gen5-raspberry-pi4/218405/25

Did you try this solution?

TheArcturian commented 3 years ago

No, but I will get a cheap unpowered USB 2.0 hub. That will do the trick. Also it makes the stick further away from the Raspberry which is supposed to reduce interference.

sstratoti commented 3 years ago

@m3ki - thank you. That docker image has seem to have done the trick for getting my network back online. It was super frustrating - I could see through the admin gui that everything was up and running, could see it posting into MQTT, but for some reason HA kept saying that everything was "unavailable"...

OpenZWave / qt-openzwave

MQTT client timeout #140

External MQTT Brokerconnection zpie01address hassio.m3ki.nettopic OpenZWave/1/# both # <-- 1 is the id of one of many of my instances update as neededremote_username [redacted]remote_password [redacted]