Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.77k stars 1.64k forks source link

Devices "randomly" go offline #14422

Closed amilanov75 closed 1 year ago

amilanov75 commented 1 year ago

What happened?

After 6-8 weeks of stability, all my Philips Hue White Ambience GU10's (on one network) reported offline and are completely unresponsive from the Z2M UI. I have an Aqara FP1 that I added to the network recently, after the last issue was seen about 6 weeks ago, and it appears to remain online.

Philips bulbs have different firmware e.g. 1.93.11 (latest version) and 1.65.11_hB798F2B older version, however all bulbs are showing offline. https://www.philips-hue.com/en-us/support/release-notes/lamps).

EDIT: I did some more investigating into the last incident, before this one. And in that incident, the Z2M service was still running but all my devices were reporting "offline". This was on a different network and the devices where Philips and Ikea. I wanted to point this out, as the above post on it's own make it seem like it's a Philips only issue, which it is not. It appears to be device agnostic.

EDIT2: On the advice of a friend, I looked into enabling Herdsmen logging. The problem with enabling herdsmen is that I would have to stop / start the service to enable it, after which the issue would have gone away... FYI, I have left the network in it's current "broken" state for the time being, in case there are any further requests that I can fulfil.

EDIT3: Adding more info as I progress. I am on channel 25 for both my zigbee networks. All devices (other than the 1x Aqara FP1) report offline and are unresponsive, when the issue occurs. So we are talking about 25+ devices on one network and 7 devices on other network.

EDIT 4: Late night investigation resulting in me recalling that the only remedy was to unplug the Z2M stick and plug it back in again. I tried to restart the zigbee service as well as rebooting Deiban, but neither resolved the issue. This has occurred on 2x separate sticks, across 2x networks... shame I forgot about edit 2 above, before unplugging the stick!

What did you expect to happen?

Devices stay "online" for ever :)

How to reproduce it (minimal and precise)

I can't reproduce it. It seems to happen randomly after a period of time, around 6-8 weeks.

Zigbee2MQTT version

1.25.2 commit: a252914e

Adapter firmware version

20220302

Adapter

cc2652P2 USB stick zigbee2mqtt ZigBee Zigstar v4

Debug log

Log.txt.txt

sygys commented 1 year ago

I also have philips hue lights go offline form time to time. For me its the normal bulbs with white ambiance. the strange thing is that these bulbs after they go offline are very hard to reset or connect back to the network. they seem to have crashed so hard that even after a hard reset with a dimmer switch and repair it in z2m they keep giving errors. the only fix i found is to force remove the light from z2m. restart z2m then do a hard reset on the light and repair it in z2m. its kind of annoying to have to do this every so many weeks indeed. there doesnt seem to be any reason why these lights fall off the network. they just seem to crash or something.

amilanov75 commented 1 year ago

Interesting, my issue is slightly different. If I just unplug the Z2M stick and plug it back in, it works again. That said, every time I have tried this, I previously restarted the service and rebooted the server, so maybe it requires more than just unplugging and re-plugging.

The issue also happened with Ikea bulbs, so I tend to think it's a broader problem.

Are you on the latest Philips firmware? 1.93.11

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

amilanov75 commented 1 year ago

Can anyone else comment on this?

schovanec commented 1 year ago

I have been having issues similar to this with my Hue motion sensors. They will randomly become unresponsive. When I go to investigate they usually have show a little warning triangle. If you hover over the triangle it says "interview failed" and if I look at the bindings tab the coordinator bindings are missing. Clicking reconfigure fails and deleting and re-joining them also seems to not re-create the missing bindings. The only thing that seems to work is to restart Z2M, then delete and re-pair. The reconfigure button also sometimes seems to work immediately after restarting, but it seems to be inconsistent.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

RubenKelevra commented 1 year ago

Does this also happen on 1.29.2? There were some db issues fixed which can lead to Z2M become unresponsive/laggy.

sygys commented 1 year ago

I experience some strange behaviour with one of my groups it seems that I cannot seem to fix it. And it effectst the stability of all my other 120+ devices in the network. It seems that lights in this group randomly become unresponsive. Before I had 5 innr gu10 spots that dropped off the network or better said got unresponsive randomly. So i thought it was the fault of the innr spots. After I replaced them with hue gu10 spots. But they have the same problem. The funny thing is that when controlled from a group all lights keep responding but individually they don't. So i think this must be a problem with z2mqtt. When one of these spots from the group is not responding the whole network starts to lag. Devices don't post their states or very late.

I also have a feeling that z2mqtt doesn't reroute the network when things change atleast not until devices are rebooted. I have around 70 hue lights in our home at the moment and nothing works responsive. Lights randomly lag. Our 12 motion. Sensors now and they don't post their states or it takes around 15 seconds for lights to turn on. There are some serious problems with z2mqtt with large networks. It just doesn't work. I would like to invite @Koenkk to visit our home sometime as he also lives in the same city as I do. To see what happens. Maybe we can trouble shoot this mess

amilanov75 commented 1 year ago

I am on 1.28.2 of Z2M.

From memory I tried 20220928, 20220219 and a third version of the firmware driver for the zigbee stick. In the end I have bought a new zigbee stick which uses another version of the driver and so far, so good.

In another month or two I will move my other network to the new stick as well if the issue does not come back.

If the issue relates to the Z2M version then given I am on 1.28.2, I suspect I will eventually see the issue (on both my zigbee sticks if it was not fixed until 1.29.2). If that happens, I will update to 1.29.2 and test again for a few months, and also cry a little at the time/money I spent debugging and moving to another zigbee stick :(

Koenkk commented 1 year ago

A firmware which has been confirmed to improve stability for larger networks will be released soon: https://github.com/Koenkk/Z-Stack-firmware/tree/6.10.01.01/coordinator/Z-Stack_3.x.0/bin

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days