Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.78k stars 1.64k forks source link

Ikea tradfri bulbs/spots drop from network (1.6.0 / 20190608) #2032

Closed sandervandegeijn closed 4 years ago

sandervandegeijn commented 4 years ago

Bug Report

What happened

After a while I can't turn on Ikea Tradfri lamps (two GU10 spots and four E27 bulbs). They seem to be offline from zigbee2mqtt's perpective. CC2531 stick is close by all the bulbs and on USB extension cable, Xiaomi sensors that are much farther away can reach the stick just fine and work fine.

https://pastebin.com/cKRKMrUj

Power cycling the lamps does not help:

After cycling power:

https://pastebin.com/darPm1HP

Watching the logs for a while, I see some errors that reference device id's that match the Ikea bulbs:

https://pastebin.com/U2nLZLc6

What did you expect to happen

Lamps keep working

How to reproduce it (minimal and precise)

Wait a few hours

Debug Info

zigbee2mqtt version: 1.6.0 Also tried latest-dev docker image from 24-09-2019: same behaviour ( https://pastebin.com/mqx5AdQ4 ) CC253X firmware version: 20190608

oxelamp commented 4 years ago

Can you try coordinator version 20190619? Worked wonders for my mainly tradfri setup.

sandervandegeijn commented 4 years ago

I flashed: https://github.com/Koenkk/Z-Stack-firmware/blob/master/coordinator/Z-Stack_Home_1.2/bin/source_routing/CC2531_SOURCE_ROUTING_20190619.zip

Networkmap looks instantly better. Will monitor it. Thanks for the tip!

sandervandegeijn commented 4 years ago

One night later: same problems. Network is gone, Xiaomi stuff still works.

https://pastebin.com/znTBbxzC

Koenkk commented 4 years ago

Xiaomi stuff still works.

Can you still control your non TRADFRI devices?

sandervandegeijn commented 4 years ago

The Xioami stuff are only sensors (door contact sensors, buttons, motion detection, temp sensors). Those seem to work. The source routing firmware did force some of them to go via a Tradfri bulb initially.

Networkmap now: https://pastebin.com/Zd5HLvsr

It does see the Tradfri bulbs as online but can't control them and the mesh network is gone. Yesterday evening before I went to sleep it was looking much better.

Koenkk commented 4 years ago

Do you see anything in the zigbee2mqtt logging when power cycling the bulbs?

sandervandegeijn commented 4 years ago

https://pastebin.com/96whPKk6

Pluggin it in after the last

Sep 28 11:05:51 neo-server zigbee2mqtt[412]: #033[33m zigbee2mqtt:warn#033[39m 9/28/2019, 11:05:51 AM Failed network lqi scan for device: '0x086bd7fffe320785' with error: 'Error: Timed out after 10000 ms'

Those ID's in the lqi scan lines correspond with the Ikea Tradfri bulbs. Did it a second time with two GU10 spots that were lost in limbo:

https://pastebin.com/1XPJfhYH

Same result. Xiaomi stuff still rock solid so clearly the radio is working OK.

Koenkk commented 4 years ago

I looks that they are rejoining again; somehow they are kicked of the network. Do you have any other zigbee hubs running (e.g. TRADFRI gateway)

To dive deeper into this issue, could you sniff the network, start at a point where you can still turn the bulbs on/off until the point where you can't (https://www.zigbee2mqtt.io/how_tos/how_to_sniff_zigbee_traffic.html).

sandervandegeijn commented 4 years ago

Hmm I do have an aqara gateway still on, but these networks should be able to Co exist right? My neighbors could also have Philips Hue or something else. Will try to sniff later this week.

sandervandegeijn commented 4 years ago

Strangely enough unplugging the Aqara gateway did help (at least for now, will monitor the network). But I do not understand why, multiple networks should be able to co-exist right?

sandervandegeijn commented 4 years ago

Aqara gateway is not in use, but the lamps dropped of the zigbee network again. I'll try to do some sniffing.

anzecesar commented 4 years ago

Hi, I'm having the same issue. First I had the default firmware, some lights further away from controller were having networking issues. Similar as described here.

I flashed with source_routing. It seems better initially, but overtime some lights become unresponsive. I have a lot of Tradfri gu10 spots with some e27s. All Tradfri, ~30 lights total. The misbehaving lights are always further away from the controller.

I have no other zigbee hubs online.

I have xiaomi switches and 4 temperature sensors. Other than pairing difficulties I have no issues with the switches.

I'm also seeing this often:

zigbee2mqtt:info 10/1/2019, 7:23:12 PM Zigbee publish to device '0x086bd7fffe06b9c8', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - 1
  zigbee2mqtt:error 10/1/2019, 7:23:18 PM Zigbee publish to device '0x086bd7fffe06b9c8', genOnOff - off - {} - {"manufSpec":0,"disDefaultRsp":0} - null failed with error Error: AF data request fails, status code: 205. No network route. Please confirm that the device has (re)joined the network.

Sending the signal multiple times eventually works.

I'll try to get a hold of another usb stick to capture some traffic

sandervandegeijn commented 4 years ago

Distance does not seem to affect it here. It is definitely better with this firmware but still not entirely stable. Will try to capture traffic this weekend.

sandervandegeijn commented 4 years ago

zigbee2mqtt.txt

Lamps dropped of the network today again. Didn't have the sniffing stick ready, I did spot a crash of zigbee2mqtt however. See attached logfile @ time 18:50. It crashes and restarts itself. Don't know if this related 1:1 to my tradfri problem but it is not expected behaviour.

@Koenkk Should I create a separate issue for this?

Koenkk commented 4 years ago

@neographikal It seems that the stick crashes because all of the network scans.

Could you disable the automatic network scan? (maybe this also solves this problem?)

sandervandegeijn commented 4 years ago

will try, thanks. Where can I find this option?

https://www.zigbee2mqtt.io/configuration/configuration.html

Doesn't mention it it seems.

Koenkk commented 4 years ago

It's not done by zigbee2mqtt itself but done by some external application, do you e.g. have the zigbee2mqtt node red addon or home assistant zigbee2mqtt network map addon installed?

sandervandegeijn commented 4 years ago

Okay, thanks. I have zigbee2mqttAssistant running

root@neo-server:/home/neo# docker stop zigbee2mqttassistant

Let's see what that does :)

bochelork commented 4 years ago

Can you sniff your network? I just noticed that network address changes are not handled well. This is what I did to get my Osram device back to my network.

Koenkk commented 4 years ago

Will try to implement a fix for this tomorrow (https://github.com/Koenkk/zigbee2mqtt/issues/2017#issuecomment-538142703)

manup commented 4 years ago

The problem might be in the light firmware itself. We have various reports for IKEA GU10 and E27 dimmable and white spectrum bulbs dropping off. They become only responsive again after a power-cycle. This happens even with the IKEA gateway itself https://github.com/dresden-elektronik/deconz-rest-plugin/issues/1261#issuecomment-536604489

Firmware versions at least up to 1.2.214 seems to be affected. No reports yet for newer firmware 2.3.007 (white spectrum) lights and new hardware revisions.

https://github.com/dresden-elektronik/deconz-rest-plugin/issues/1261

Koenkk commented 4 years ago

@manup https://github.com/dresden-elektronik/deconz-rest-plugin/issues/1261#issuecomment-523470291 could it be that the lights change the networkAddress without announcing the new one? Otherwise it would be strange that group commands still work? A re-power would trigger a device announce resulting in the gateway receiving the new network address.

manup commented 4 years ago

It varies, a changed nwk address is handled in our controller on all incoming commands. We also send out NWK address request broadcasts for unresponsive devices which would detect the change.

In my setup it happened only twice that a E27 dimmable lights got stuck, after it was running for months without issues. I fired up the sniffer to see if the light is doing at least something, but there was total silence, not even the regular NWK link status commands every 15 seconds. So it seems in this case the firmware was stuck or crashed all together. I think I've tried even Touchlink but it didn't respond.

I haven't found a way to reproduce the error yet, perhaps bringing it in a large network with multiple hops and many broadcasts could trigger the fault.

Koenkk commented 4 years ago

Just released a fix for https://github.com/Koenkk/zigbee2mqtt/issues/2017 in the dev branch, let me know if this also fixes this issue.

sandervandegeijn commented 4 years ago

Updated to latest docker version, will monitor for problems. Response times when issues multiple commands are much faster now.

Disabling zigbee2mqtt-assistant didn't help btw.

JBS5 commented 4 years ago

@manup dresden-elektronik/deconz-rest-plugin#1261 (comment) could it be that the lights change the networkAddress without announcing the new one? Otherwise it would be strange that group commands still work? A re-power would trigger a device announce resulting in the gateway receiving the new network address.

About the working group command in deCONZ: All of the times sooner of later also a group command failed and have to powercycle the light.

sandervandegeijn commented 4 years ago

Unfortunately last nights fix did not resolv the dropping of the Tradfri lamps from the network. Will try to sniff the traffic later this weekend.

mecrip commented 4 years ago

I've got the same issue starting from 1.6.

I downgraded but looks like that something is crashed in the stick because the issue is still happening. Using latest source routing firmware.

mecrip commented 4 years ago

In my case it will also bring down the whole network after some time ,so other devices (like xiaomi) became not reachable until system reboot.

sandervandegeijn commented 4 years ago

Didn't get the sniffing working yesterday. Will try again.

sandervandegeijn commented 4 years ago

I need some help with setting up the sniffer. Flashed the stick with the zboss firmware, downloaded wireshark, started the zboss gui, pointed it to Wireshark, added the keys andddd no traffic in Wireshark.

Zboss does do something, the LED on the stick goes from flashing to solid green when starting Wireshark but that's about it. No traffic shows up. Tried to capture to pcap file in zboss gui, but that stays empty as well.

Running on Win10 X64

--

Right and now it starts capturing..

bochelork commented 4 years ago

Sounds correct to me. That's how I sniff as well.

sandervandegeijn commented 4 years ago

It's collecting data now. Hopefully the lamps will drop off the network this night.

sandervandegeijn commented 4 years ago

Of course everything worked this morning. Still, pcap might contain relevant info. Attached captured traffic. Everything OKAY.zip

mecrip commented 4 years ago

My issue disappeared when i switched the cc2531 with a new one. Maybe it was crashed oor something.. but even if reflashed it did not work properly. The newer adapter works flawlessly since a week.

sandervandegeijn commented 4 years ago

It was stable for 4 days here, but when I got home lamps weren't working again. Makes capturing the problem difficult but will try anyhow...

sandervandegeijn commented 4 years ago

Finally have a capture from a night where the lamps stop working! When I wake up none except one lamp was working. Zigbee2mqtt was still working though.

Lamps gone from network.zip

sandervandegeijn commented 4 years ago

There is definitely something going on with the Tradfri stuff. I bought an Ikea hub to test and see which firmware versions my lamps are: latest for all of them. Pairing, no fuss at all, works like a charm. Because I changed the CC2531 stick to another one I had to repair everything.

Xiaomi Aqara stuff: pairs instantly, no matter how far away from the coordinator they are. Hold the button for 4 secs and presto, joins the network, no fuss. The Ikea stuff doesn't want to comply:

Oct 27 14:01:26 neo-server zigbee2mqtt[1302]: #033[32mzigbee2mqtt:info #033[39m 2019-10-27T13:01:26: Starting interview of 'Tradfri lamp bay window 2'
Oct 27 14:01:26 neo-server zigbee2mqtt[1302]: #033[32mzigbee2mqtt:info #033[39m 2019-10-27T13:01:26: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"pairing","message":"interview_started","meta":{"friendly_name":"Tradfri lamp bay window 2"}}'
Oct 27 14:01:46 neo-server zigbee2mqtt[1302]: #033[31mzigbee2mqtt:error#033[39m 2019-10-27T13:01:46: Failed to interview 'Tradfri lamp bay window 2', device has not successfully been paired
Oct 27 14:01:46 neo-server zigbee2mqtt[1302]: #033[32mzigbee2mqtt:info #033[39m 2019-10-27T13:01:46: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"pairing","message":"interview_failed","meta":{"friendly_name":"Tradfri lamp bay window 2"}}'

Not just one of 'em, but most of them. Really difficult to get them on the network while all the simple battery powered Xiaomi stuff works fine. Eventuelly it works, but it's not going really smooth. I'm now at:

Version of Zigbee2Mqtt: 1.6.0 (latest dev of today). Coordinator version: 20190619 (source).

What can I do to provide debug info to resolve this? Or I could send over / bring by some lamps :) In my previous post I provided a wireshark capture as well.

Edit: attached Wireshark capture pcap file from trying to join.

cannot join Ikea Lamp.zip

Edit: second try: Second try.zip

root@neo-server:/opt/zigbee2mqtt# tail /var/log/syslog -f | grep zigbee
Oct 27 14:25:26 neo-server zigbee2mqtt[1302]: #033[32mzigbee2mqtt:info #033[39m 2019-10-27T13:25:26: Starting interview of 'Tradfri lamp behind couch 2'
Oct 27 14:25:26 neo-server zigbee2mqtt[1302]: #033[32mzigbee2mqtt:info #033[39m 2019-10-27T13:25:26: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"pairing","message":"interview_started","meta":{"friendly_name":"Tradfri lamp behind couch 2"}}'
Oct 27 14:25:46 neo-server zigbee2mqtt[1302]: #033[31mzigbee2mqtt:error#033[39m 2019-10-27T13:25:46: Failed to interview 'Tradfri lamp behind couch 2', device has not successfully been paired
Oct 27 14:25:46 neo-server zigbee2mqtt[1302]: #033[32mzigbee2mqtt:info #033[39m 2019-10-27T13:25:46: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"pairing","message":"interview_failed","meta":{"friendly_name":"Tradfri lamp behind couch 2"}}'
Koenkk commented 4 years ago

As you have the IKEA hub now, it would be interesting to see how stable it is when having the bulbs paired to the IKEA hub.

sandervandegeijn commented 4 years ago

will try that as well, but can you make something of the traces I made? I can't pair the bulbs with zigbee2mqtt at all while the Xiaomi stuff works flawless

Koenkk commented 4 years ago

Is the Ikea hub still turned on? You should power down the complete ikea hub network (hub and bulbs)

LukeHandle commented 4 years ago

@neographikal I had pairing issues with the tradfri GU10 bulb. It stayed in the flashing state and was not being picked up by z2m (took 5+ attempts. Unsure if holding very near helped our not). The LED controller and E27 were painless though.

Initially (before changing the z2m security key - not sure if related) I only had the GU10 connected and it sometimes came on by itself. Hasn't happened in the last week since setting up again.

Can't confirm firmware versions, but I can dig up the models if that helps.

sandervandegeijn commented 4 years ago

Hmm the GU10's aren't giving me trouble, Powered everything down and started to pair everything 1 by 1. Paired the first three E27's succesfully, now I got two that won't. And they have worked correctly.

It is all over the place, no pattern to be recognized, but surely not stable..

-- Ok I'm done for today, Now four lamps try to pair and no matter how many times I try, "interview failed". No clue what's going on, these lamps worked fine except for dropping of the network now and then.

sandervandegeijn commented 4 years ago

Ok, now it's getting even weirder; as a last resort I flashed the Z-stack 3.0.x firmware for the CC2531. All the light bulbs pair instantly, no problems at all......

No clue how stable the network is going to be but the difference with the other version is remarkable.....

Koenkk commented 4 years ago

@neographikal I've ran my production network a few months on CC2531 Z-Stack 3 and consider it stable.

sandervandegeijn commented 4 years ago

Thanks, I'll monitor the stability. Using the current daily build of zigbee2mqtt as well. First results are encouraging! Maybe update the docs so that people will start using zstack 3.0?

Koenkk commented 4 years ago

@neographikal let's first see if it keeps working now.

sandervandegeijn commented 4 years ago

The answer is no. In the morning, everything was still fine. Got home from work, half of the lamps were not working anymore. Logs do not provide a lot of useful info:

Oct 28 18:35:08 neo-server zigbee2mqtt[1302]: #033[31mzigbee2mqtt:error#033[39m 2019-10-28T17:35:08: Publish 'set' 'state' to 'Tradfri lamp bay window 2' failed: 'Error: Data request failed with error: 'No network route' (205)'

Lamps just drop off the network again. What can I do to help debug this?

Koenkk commented 4 years ago

Please try https://github.com/Koenkk/zigbee2mqtt/issues/2032#issuecomment-546702323. I first want to make sure this isnt a hardware issue of the bulbs.

sandervandegeijn commented 4 years ago

Already did that, no other hubs present, disconnected all the Ikea bulbs and paired them 1 by 1. No difference.