Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge πŸŒ‰, get rid of your proprietary Zigbee bridges πŸ”¨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.76k stars 1.64k forks source link

Hue connectivity issues #4694

Closed Koenkk closed 2 years ago

Koenkk commented 3 years ago

It seems that still some users experience connectivity issues with Hue devices (especially the end devices like Hue dimmer remote and the motion sensors). While the issue of devices completely leaving seems to occur much less (#2693) the famous red LED still occurs (also for my Hue outdoor motion sensor).

The red LED will blink if the device did not receive an acknowledgment from the coordinator after it sends a command (default response in Zigbee terminology). I believe this happens when the Hue end device switches parent while the coordinator still has an old route to the end device. Eventually the coordinator recovers, but this will take a few seconds which is too late for these Hue devices causing the red LED to blink.

The latest dev branch contains a possible fix, it tries to detect when a Hue device switches parent and immediately rediscovers the new route after that. (https://www.zigbee2mqtt.io/how_tos/how-to-switch-to-dev-branch.html)

If with the latest dev branch people still experience we need to herdsman debug logging combined with a sniff at the moment the red LED blinks:

timota commented 3 years ago

Issue still persist with the latest dev branch and latest FW SW: 1.16.1-dev commit: ca33e22 FW: 20201026 Coordinator: cc1352p-2

timota commented 3 years ago

Some info regarding sensor. When sensor disappeared from network - 5 min later i just allowed joined (Permit Join) and it back again. Hope this cab be helpful. Unfortunately no debug.

timota commented 3 years ago

Hi @Koenkk, finally i have managed to catch debug for sensor. (link below) It contains 2000 lines before leave message and 500 after, hope this is enough for debug. Sensor is 0x001788010647face

Debug

Please let me know if you need more info.

Thanks in advance.

Koenkk commented 3 years ago

@timota thanks for the good logs. I think the problem here is that the motion sensor switches parent, sends a message to the coordinator, the coordinator fails to acknowledge because it still think it has a different parent, the recovery takes too long making the motion sensor start rejoining/leaving. Looking at the logs we cannot fix this from zigbee2mqtt since we know too late if the acknowledgement fails and we have no indication that the motion sensor switched parent. That means it has to be fixed in the coordinator firmware.

I know this is hard but would you be able to create a sniff from the working situation till it stops working? https://www.zigbee2mqtt.io/how_tos/how_to_sniff_zigbee_traffic.html

timota commented 3 years ago

@Koenkk Sorry for late response. Debug and sniffer have been started. As soon as sensor fails i will post my findings.

Thanks

timota commented 3 years ago

@Koenkk, finally i managed to get dump (hope i sniffed traffic correctly). So, some background info:

issue happened: 2020-12-02T17:51:36.233Z zigbee-herdsman:controller:log Device leave '0x001788010647face'

Debug is huge (201m) so can't put it on pastebin - sharing google drive link sniff

If you will need zigbee2mqtt logs for this sniff - let me know.

Hope this helps. Thank.

Koenkk commented 3 years ago

The problem seems to be that the coordinator sends a leave request to the motion sensor with rejoin set to false:

image

Not sure why it happens, but can you check if it is fixed with the following firmware: znp_CC1352P_2_LAUNCHXL_20201203_skip_leave.hex.zip

timota commented 3 years ago

FW installed. Container started with Full debug.

Will let you know results. Hopefully it will work as expected, fingers crossed.

netweaver1970 commented 3 years ago

Fingers crossed, hoping for good results at your side so the FW can be fully released. Thanks both for the investigation/fixing time !

netweaver1970 commented 3 years ago

@timota No news is good news? No funny things happening/happened and no drop-out of the Hue motion sensor anymore?

0rsa commented 3 years ago

Unreliable here

warn 2020-12-20 19:37:47: Device '_IDHIDDEN' left the network info 2020-12-20 19:37:48: Device 'Sensor living room' joined info 2020-12-20 19:37:48: Starting interview of 'Sensor living room' info 2020-12-20 19:37:48: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":{"friendly_name":"Sensor living room"},"type":"device_connected"}' info 2020-12-20 19:37:48: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"interview_started","meta":{"friendly_name":"Sensor living room"},"type":"pairing"}' error 2020-12-20 19:40:40: Failed to interview 'Sensor living room', device has not successfully been paired

I use Electrolama zig-a-zig-ah! with official CC26X2R1_20201026.hex firmware Zigbee2mqtt 1.16.2

Can you provide a test firmware for CC26X2R1?

Thank you

I switched to 1.16.2-dev according to the first post to see if this is better.

timota commented 3 years ago

@netweaver1970 @Koenkk Morning guys,

sorry for a long response, was busy... During my testing with a new firmware i didn't see any disruptions. All sensors, including Hue are working as expected. Before consider fw as stable i would like to do another final test - move sensor outdoor as it should be and connect it via router. I'm going to do that today. I will let you know, hope to the end of this week, usually, before it (hue) fails at 2-3 days.

Regards.

ellnic commented 3 years ago

I have the LAUNCHXL-CC26X2R1 and am experiencing this issue. Dimmers I've never had a problem with, but motion sensors are the bane of my life right now.

It wasn't really a problem before the last 2 months or so, but I have added a lot more routers and this seems to tie in with what is being described here. Before they were most likely directly connected to the coordinator, now they are not.

It'll be good to see if this fixes the issue πŸ˜ƒ

ChessSpider commented 3 years ago

I have CC2530_CC2591_20190523 zigbee 3.0 firmware now, and i sometimes experience the same problem. Unsure if it also has the same cause. Anyway, new firmware is always nice

0rsa commented 3 years ago

I can see this fix in latest release: #1925 #2274 Xiaomi Motion Sensor RTCGQ11LM not reporting occupancy true I have exactly the same behaviour for Hue sensors. After a couple of day, one sensor (random one) is sending occupancy: false even if the even is triggered by a motion. I give up, I purchased and I'm waiting for Xiaomi sensors working with gateway to remove zigbee2mqtt from my system.

podi62 commented 3 years ago

Any news on this? @Koenkk, if patch znp_CC1352P_2_LAUNCHXL_20201203_skip_leave.hex.zip is working are you going to port this on the main CC fw and on the Source Routing version? Thx

Koenkk commented 3 years ago

@timota is it working correctly now?

@podi62 yes but it needs more testing (first CC2652, after that CC2531)

foxylion commented 3 years ago

I have a CC2652RB and the same symptoms with my hue motion sensors.

So I could test the CC2652RB firmware. Is there a prebuilt one? And a tutorial how to flash the new firmware?

Koenkk commented 3 years ago

Issue doesn't seem to be fixed, got a red light today on my hue motion outdoor sensor. 99% sure the issue has to be fixed in the firmware. Asked TI for support: https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/968896

TheJulianJES commented 3 years ago

Same issue also happens on ZHA (HomeAssistant) sometimes. So it doesn't look like it's a bug in either ZHA / Z2M.

Koenkk commented 3 years ago

TI has provided a potential fix, link to new firmware: https://github.com/Koenkk/Z-Stack-firmware/tree/develop/coordinator/Z-Stack_3.x.0

ellnic commented 3 years ago

I will give this a go and report back ASAP. My drop offs were every week or so. We'll see how this goes! :-) thanks

TheJulianJES commented 3 years ago

Upgraded yesterday and today I experienced a red (long) blink on my Hue outdoor motion sensor (message still seems to have arrived though). Motion sensor is inside (for testing) and does one hop through a "ZB3" Hue light. (Using ZHA) Does the potential fix work for others? (Thanks, Koenkk btw!)

sjorge commented 3 years ago

Issue doesn't seem to be fixed, got a red light today on my hue motion outdoor sensor. 99% sure the issue has to be fixed in the firmware. Asked TI for support: https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/968896

Ah good I'm not the only one, I started having it again since switching to the zzh-lite.

TheJulianJES commented 3 years ago

Issue doesn't seem to be fixed, got a red light today on my hue motion outdoor sensor. 99% sure the issue has to be fixed in the firmware. Asked TI for support: https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/968896

Ah good I'm not the only one, I started having it again since switching to the zzh-lite.

You're also running the 20210107 version from the develop branch, I guess?

sjorge commented 3 years ago

Issue doesn't seem to be fixed, got a red light today on my hue motion outdoor sensor. 99% sure the issue has to be fixed in the firmware. Asked TI for support: https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/968896

Ah good I'm not the only one, I started having it again since switching to the zzh-lite.

You're also running the 20210107 version from the develop branch, I guess?

Correct

ellnic commented 3 years ago

Same here. Still experiencing the same.

TheJulianJES commented 3 years ago

Upgraded yesterday and today I experienced a red (long) blink on my Hue outdoor motion sensor (message still seems to have arrived though). Motion sensor is inside (for testing) and does one hop through a "ZB3" Hue light. (Using ZHA) Does the potential fix work for others? (Thanks, Koenkk btw!)

Although it may not be related to the firmware update, it seems to be a bit better. I had less red blinking on my (testing) outdoor motion sensor. Can anyone report if it (mostly) "fixed" itself after some time with the latest firmware perhaps?

Edit: Nvm, it just happened after many days of it having work properly.

ellnic commented 3 years ago

@TheJulianJES from my end, I thought it was about the same. But last night I lost connection entirely to an outdoor sensor I've never had problems with before. Could be coincidence but it might be slightly worse.

Koenkk commented 3 years ago

In case this happens, please try to capture a sniff of this + herdsman debug logging (otherwise I cannot debug this further).

To enable herdsman debug logging, see https://www.zigbee2mqtt.io/information/debug.html#zigbee-herdsman-debug-logging

ellnic commented 3 years ago

I've ordered a CC2531. I will try to get a sniff when it's here.

foxylion commented 3 years ago

I've ordered a CC2531. I will try to get a sniff when it's here.

I did the same, but will take a while until it is delivered... ;-)

ellnic commented 3 years ago

CC2531 is here! Sniff started, is there some way to filter out helpful information? Or just submit entire sniff once issue occurs?

Koenkk commented 3 years ago

@ellnic just share the whole sniff.

I'm starting to think this issue also depends on the network setup. Since I removed all my CC2530 routers from my network (replaced ptvo CC2530 router with a CC2652R and a Gledopto led controller with a Tuya one) I never saw this issue again.

sjorge commented 3 years ago

Interesting, I do think mine is connected over an Ubisys S1 which use CC253x IIRC. (I already found a few bugs in there firmware that I reported to them, so having more lurk in there wouldn't surprise me)

ellnic commented 3 years ago

Ok will do. Btw, I noticed this:

3352 161.626585 0x7ffd 0x0000 ZigBee HA 58 Unknown Command: 0x00, Seq: 80

when pressing a button on a Hue dimmer. I've never had a problem with my Hue dimmers, but that doesn't look like it should happen...

Koenkk commented 3 years ago

@ellnic that's expected, the hue dimmer sends manufacturer specific commands (not defined in the ZCL spec so wireshark will show them as unknown)

ellnic commented 3 years ago

I'm running into an issue where Wireshark stops capturing after about 5 minutes. I have repeated this several times, tried a different USB port etc. Is there an obvious setting I have missed that is causing this or should I just jump to a different machine?

Edit: This seems to be solved by not running as root. All devices have been repaired, and the sniff seems to be running well. Has been for about 10 minutes now. I have no idea how long it is going to take for a devices to fall off. It's anywhere from a few days to a couple of weeks.

Edit2: Actually, it hasn't solved it. Have just checked and it's stopped capturing. If I click start I get an error End of file on pipe magic during open. and I have to quit wireshark and open again. I'm kind of stuck here.

Koenkk commented 3 years ago

Is modemmanager installed? https://www.zigbee2mqtt.io/information/FAQ.html#modemmanager-is-installed

ellnic commented 3 years ago

It was, but I have removed it and the problem persists. I will fire up a minimal Debian box tomorrow and try again.

ellnic commented 3 years ago

Phew.. well it turns out that (after a lot of head scratching, service disabling and at a last resort different machines/distros) the problem is this:

whsniff -c ZIGBEE_CHANNEL_NUMBER | wireshark -k -i -

For some reason, Debian 10 and Ubuntu 18.04 do not like the way this is formatted and ultimately fail after 300-40K packets, for what reason I am not sure but the pipe breaks.

The error end of file on pipe magic during open when attempting to resume wireshark by clicking the fin icon, lead me to here and in turn to the whsniff project page here where it suggests also using:

wireshark -k -i <( path/to/whsniff -c channel_number )

So far so good, we're at 45K packets. If this hasn't failed by this evening, I'll PR some changes to the docs.

ellnic commented 3 years ago

@Koenkk still stable and at 550K packets, so have opened PR https://github.com/Koenkk/zigbee2mqtt.io/pull/611

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

ellnic commented 3 years ago

I'm still capturing. Typical, one month and no drop offs yet! 🀨

Thinking now might be a good time to axe the capture and start a new. I have experienced some red LEDs though, do you still want this cap @Koenkk?

Koenkk commented 3 years ago

@ellnic let's see if we can catch a drop

ellnic commented 3 years ago

@Koenkk Ok, no problem. I've saved the [massive] cap file up until now and started a fresh one. As soon as I have something I'll post back.

audiobah commented 3 years ago

@Koenkk Some information about my experience with Hue outdoor sensor. My coordinator is CC2538. I can connect the sensor only if I am near the coordinator. Then I go outside of the house to install it on another building, there is a lot of routers there nearby but sensor (I guess) do not see signal from coordinator and not trying to connect to any of the routers. And then I see that sensor has left the network.

cmorlok commented 3 years ago

I have the very same issue with the Hue Outdoor Sensor. After pairing it, it either remains in the network but stops reporting motion and temperature. Or it leaves the network very soon. I can enforce it to leave for example by starting an OTA update.

SW: 1.18.1 FW: JH_2538_2592_ZNP_USB_20201010 Coordinator: cc2538

frank3523 commented 3 years ago

same issue here i switched from cc2531 to cc2652rb from slae but no luck. Zigbee2mqtt detects the sensor but no motion detection. motion, temp, illuminance all is unknown.

cmorlok commented 3 years ago

@frank3523 Which FW are you using on cc2652?