dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.9k stars 505 forks source link

IKEA lights occasionally lost connection #1261

Closed JBS5 closed 3 years ago

JBS5 commented 5 years ago

Occasionally a light (mostly a Tradfri GU10) gets unavailable in the Phoscon app and can not be switched off/on via Phoscon (or HASS). Using deCONZ 2.05.55 and firmware 262F0500 on the Conbee now, but got same problem with older versions of deCONZ and Conbee firmware.

(Not always this light)

  1. Any clue why?
  2. Is it possbile to restore the connection other then disconnect/connect power?
peer69 commented 5 years ago

Same here with 2.05.58. One Tradfri GU10 seems to be unresponsible atm:

image

Happened to me with a hue light strip as well a few days before so I dont suppose any IKEA specific issue. Devices have to be powercycled and are back as normal after that. Still annoying in some cases, mostly for my FLOALT Panels which are directly powered and do not have a wall switch to powercycle them.

ebaauw commented 5 years ago
  1. Any clue why?

The REST API plugin marks a light as unreachable, when it doesn't receive a response for a couple of times when polling the light for its state. The cause of not receiving a response is, in order of likelihood: a. The light's power has been cut (e.g. by a 20th century wall switch); b. The Zigbee network has a hiccup (e.g. due to radio interference or routing issues in the mesh). In this case, the light still reacts to group commands; c. The light's firmware has crashed.

  1. Is it possbile to restore the connection other then disconnect/connect power?

In a) and c): no. In b): yes.

The REST API plugin marks the light as reachable, when it receives a message from it. Powering up the light causes it to send a Device Announcement message. In b), typically, the light comes back spontaneously, when the next poll succeeds. You can also select the node in the deCONZ GUI and press 0.

thomas70 commented 5 years ago

Same problem also in version 2.05.59.

rich710se commented 5 years ago

I also have this problem, even after upgrading to 2.05.59. Today was one of my three outdoor-lights "gone". Its Tradfri bulbs all threee of them. image

JBS5 commented 5 years ago

@ebaauw Thanks for your explanation.

a. The light's power has been cut (e.g. by a 20th century wall switch);

No wall switches available for these lights, so I am not able to accidentally disconnect power.

b. The Zigbee network has a hiccup (e.g. due to radio interference or routing issues in the mesh). In this case, the light still reacts to group commands;

The lights doesn't react on group commands when connection is lost (the lights are assigned to a Hue Dimmer in the Phoscon app and doesn't respond on the Hue Dimmer when connection is lost).

c. The light's firmware has crashed.

Firmware 1.2.214 is installed on all my IKEA GU10 spots. Got 20+ of them and a random light goes offline, let's say one in the 2-3 weeks.

ebaauw commented 5 years ago

You might try the latest deCONZ version with https://github.com/dresden-elektronik/deconz-rest-plugin/commit/48d2c39a267b5c6d025577eed7530be27932aa2c.

tubalainen commented 5 years ago

I had the same experience two times the last months with two different E14 IKEA bulbs (IKEA fw 1.2.214) . Power cycling worked both of the times for me.

manup commented 5 years ago

c. The light's firmware has crashed.

When the lights don't react to even group commands it seems to looks like a firmware crash.

2.05.59 has adapted the IKEA gateway parameters to configure light state reporting. Mainly in the hope to not trigger any bugs by using configuration which IKEA itself doesn't test. Side note the change will cause no timers are used for reporting on the device anymore.

The new configuration will be applied once a light is power-cycled.

We still send some maintenance requests like group membership and neighbor table queries to the lights, and might restrict these further if stability doesn't improve with 2.05.59.

Keep in mind there might also be the possibility that a bug is in the light firmware which is not related to any requests the gateway sends.

tubalainen commented 5 years ago

c. The light's firmware has crashed.

When the lights don't react to even group commands it seems to looks like a firmware crash.

2.05.59 has adapted the IKEA gateway parameters to configure light state reporting. Mainly in the hope to not trigger any bugs by using configuration which IKEA itself doesn't test. Side note the change will cause no timers are used for reporting on the device anymore.

The new configuration will be applied once a light is power-cycled.

We still send some maintenance requests like group membership and neighbor table queries to the lights, and might restrict these further if stability doesn't improve with 2.05.59.

Keep in mind there might also be the possibility that a bug is in the light firmware which is not related to any requests the gateway sends.

+1 on "not necessarily related to deconz but rather the Zigbee network or manufacturer FW" in correlation with Zigbee standard interpretation in manufacturers FW.

peer69 commented 5 years ago

I just updated to 2.05.59 and after restarting deconz the same light is not reachable again. Pressing 0 in gui doesn't bring it back. Any other light works. In my case this might as well be an issue with the light itself.

JBS5 commented 5 years ago

@peer69 @thomas70 Did you power-cycled the light as this is needed mentioned by @manup in https://github.com/dresden-elektronik/deconz-rest-plugin/issues/1261#issuecomment-463948127 ?

The new configuration will be applied once a light is power-cycled.

peer69 commented 5 years ago

Good hint, haven't done that. For now I had to go back to .58 for another reason (high cpu load turning the gateway unrepsonsive), will try this later today with .59 again and powercycle all ikea lights.

jurriaan commented 5 years ago

I'm also having this issue, it happened twice for the same GU 10 light. Currently running 2.05.59, and I did power cycle the lights after the update.

jurriaan commented 5 years ago

Forgot to add that it sometimes seems like it's the same bulb that keeps failing. A while ago I did have issues with another bulb, and it would always be that one to stop reacting.

thomas70 commented 5 years ago

After power cycle my IKEA bulbs came back. But my IKEA FLOALT panel WS is still offline

lrnflk commented 5 years ago

I'm experiencing this same issue and have been doing so for quite a while, and I'd say that .59 actually made things worse for me. I have 80 nodes of which 32 are Trådfri lights and switches, 5 are Hue lights and the rest are different Xiaomi battery powered devices like temperature, motion, smoke detector etc. Every single type of device has been unresponsive at least once so in my case it's not just the Trådfri lights, but at the time I'm just having issues with the Trådfri and Hue lights.

The thing is that I ran all the lights through a Hue bridge and the Xiaomi sensors through the Xiaomi gateway earlier and then they were all rock solid, so I don't think it's the device firmware that's the culprit in my case unless it's caused by the change in circumstances.

I have six Trådfri GU10 lights in one location that worked perfectly before, but after the upgrade to .59 and several power cycles later they are now almost completely unresponsive and I will probably have to reset them. What's strange is that this unresponsiveness also seems to be "moving" from different lights depending on which lights that have power. If I cut the power to some of the unresponsive lights it may take a while and then it's suddenly some other light that doesn't want to work properly. Perhaps there's some offset somewhere that's breaking things?

manup commented 5 years ago

The thing is that I ran all the lights through a Hue bridge and the Xiaomi sensors through the Xiaomi gateway earlier and then they were all rock solid, so I don't think it's the device firmware that's the culprit in my case unless it's caused by the change in circumstances.

Interesting, did you also have all 32 Ikea lights on the Hue network? I'm asking because Hue bridge uses polling only and doesn't configure attribute reporting.

Did you also have router devices like the Hue or Ikea lights on the Xiaomi network?

I have six Trådfri GU10 lights in one location that worked perfectly before, but after the upgrade to .59 and several power cycles later they are now almost completely unresponsive and I will probably have to reset them. What's strange is that this unresponsiveness also seems to be "moving" from different lights depending on which lights that have power. If I cut the power to some of the unresponsive lights it may take a while and then it's suddenly some other light that doesn't want to work properly. Perhaps there's some offset somewhere that's breaking things?

Hmm that's pretty bad I really wonder how this happens, 2.05.59 is way "familiar" to Ikea lights than prior versions. The configuration is now happening like the Ikea gateway does it.

When a light becomes unresponsive can you please select the node in deCONZ and press 0 if it gets responsive/yellow again the light don't need to be power-cycled. Note the light becoming a Zombie in this case will be fixed soon, this may happen on a certain network constellation currently.

By the way the usual questions:

lrnflk commented 5 years ago

It took a while longer than expected but now everything actually appears to be working fine again. At least for now from what I can tell.

I rebooted the server and also power cycled every single mains powered light in the network to make sure that they fetched the latest configuration, but despite this it took a couple of hours before the issue went away so I was a bit premature in my assumption that the issue remained as it did not work right away.

Interesting, did you also have all 32 Ikea lights on the Hue network? I'm asking because Hue bridge uses polling only and doesn't configure attribute reporting.

Yes, sort of. I had 31 Ikea lights on the Hue network as well as the Hue lights. The 32nd Ikea device is the switch outlet which I hadn't bought back then.

Did you also have router devices like the Hue or Ikea lights on the Xiaomi network?

No, just battery powered sensors

When a light becomes unresponsive can you please select the node in deCONZ and press 0 if it gets responsive/yellow again the light don't need to be power-cycled. Note the light becoming a Zombie in this case will be fixed soon, this may happen on a certain network constellation currently.

I did try this multiple times earlier with no effect. And as for the hardware and setup, I'm using a ConBee with USB extension cable and 262F0500. Since everything seems to be working fine for me now this info may not be of any use at the moment but I'll try not to jump to any conclusions and let the network run for a few days to make sure that the issue doesn't return.

olemr commented 5 years ago

I have been running .59 since last week-end and I still lose random Ikea lights. (16 E27 bulbs on house facade.) Bulb FW is the same as others still on the Ikea Gateway. Using ConBee with 262F0500 FW. Last week-end I also bought a HUE bridge and was just about to start moving the lights over when I noticed the 'Under the hood' release note for .59. Decided to hold off, but will re-consider this upcoming week-end. Deconz will still be my best Xiaomi/mi Cube controller of choice. Haven't missed a gesture yet.

jurriaan commented 5 years ago

I have been running .59 since last week-end and I still lose random Ikea lights. (16 E27 bulbs on house facade.) Bulb FW is the same as others still on the Ikea Gateway. Using ConBee with 262F0500 FW. Last week-end I also bought a HUE bridge and was just about to start moving the lights over when I noticed the 'Under the hood' release note for .59. Decided to hold off, but will re-consider this upcoming week-end. Deconz will still be my best Xiaomi/mi Cube controller of choice. Haven't missed a gesture yet.

I have kind of the same situation here, 16 IKEA lights, 2 IKEA control outlets, a Heiman plug and an innr plug and some Xiaomi sensors (cube/door sensors/motion sensor). Never had problems with the non-IKEA devices. However I currently have almost daily issues where a IKEA light drops out of the network

I use a Conbee with a USB extension cable on firmware 0x26300500 and deCONZ .59

lrnflk commented 5 years ago

My lights have been working fine for a while now but a couple of days ago my Trådfri E14 bulb suddenly became unresponsive. One power cycle later it came back to life.

Today it was time for one of the GU10's to drop out. It's physically very close to the previously mentioned E14 so I'm not sure if it's a coincidence or not. The GU10 may very well have been routed via the E14 even though all my lights are within ConBee range.

Selecting the nodes and pressing 0 in deCONZ does not do anything. I have also tried rebooting the deCONZ container and while reroutes the network on startup it does not connect any route to that specific bulb. What would be the best approach here to proceed with the troubleshooting?

JBS5 commented 5 years ago

12 days later, another GU10 become unreachable and will not connect again without a power cycle.

Happy to share whatever info is needed to get into this issue.

rich710se commented 5 years ago

Same here, yesterday after 6 days flawless connection, I lost one of my Tradfri bulbs.. power on/off and reset didn't help. Its still yellow in deConz but cant connect or control it.

image

peer69 commented 5 years ago

Same here. After some days without any issue today two of my GU10 Tradfri lamps stopped responding. I was able to bring one of them back to life by pressing 0 in GUI, but I had to Powercycle the other one. Fortunately this only seems to happen for GU10 devices atm, my FLOALT Panels had no issues yet (in my setup they can only be powercycled by using the circuit breaker).

lrnflk commented 5 years ago

The issue has continued for me as well. I have now experienced 3-4 more GU10 bulbs losing connection as well as one of my Hue E27's and a Xiaomi door sensor (magnet). Some lights have started working again after a power cycle, others have not. Pressing 0 does nothing.

It's also noteworthy that the Xiaomi sensor started working again after I power cycled an adjacent and unresponsive GU10 so I suppose that the sensor was routing through that light, but shouldn't it automatically reroute if there are any connection issues?

Webserve commented 5 years ago

Same issue here. Yesterday I updated to the latest version .59 now a couple of Ikea lights are unresponsive

manup commented 5 years ago

Hi can you give more insights of the total network, like network size and other mains powered devices in there?

I've rearranged my home network a few days ago, now including:

deCONZ 2.05.59; ConBee firmware 0x26300500 (but 0x262f0500 is fine too).

I have 4x FLS-PP lp but these are powered off now for testing, since they act as very strong signal repeaters.

With sensors and switches the total network size is 55. All lights are always powered and till now show zero outages.

lrnflk commented 5 years ago

Here are some more detailed specifications of my network if it can be of any help. I’m still running 2.05.59 with 262F0500 and an extension cord to the ConBee. As mentioned above, after first updating to 2.05.59 and power cycling every mains powered device and waiting for a couple of hours the network was flawless for almost a week, so it seems to take a while until the issues start to appear. Unfortunately the issue reappeared and a full power cycle of all mains powered devices as well as a deCONZ reboot does not resolve the issue anymore. It also seems that the issue is wandering from device to device because sometimes a light may be unresponsive for a while and then it fixes itself.

Earlier today I had an issue where the Trådfri E14 was unresponsive as well as one Hue E27. After a power cycle of the E27 the E14 came back to life as well without me even touching it. The same goes for the unresponsive GU10's that seem to be trading places now and then, so there are at least two unresponsive GU10's every day but it's not always the same lights so some start working while others break and vice versa.

My network currently consists of the following 80 devices including ConBee and the mains powered devices are powered 24/7.

Mains powered

Quantity Type Firmware
30 Trådfri GU10 dimmable 1.2.214
4 Trådfri GU10 white spectrum 1.2.217
1 Trådfri E14 opal dimmable 1.2.217
1 Trådfri control outlet 1.4.020
3 Hue E27 White and Color A19 1.29.0_r21169
2 Hue E14 White ambiance LTW012 1.29.0_r21169

Battery powered

Quantity Type Firmware
1 Trådfri on/off switch 1.4.018
10 Xiaomi Aqara multisensor (square temp/hum/pres) 20161129
3 Xiaomi Aqara motion sensor (motion/lux) 20170627
4 Xiaomi Aqara water sensor 20170721
1 Hue motion sensor 6.1.0.18912
11 Xiaomi Aqara contact sensor 20161128
8 Xiaomi/Honeywell smoke sensor N/A
jurriaan commented 5 years ago

Last week deconz seemed to run mostly fine, but yesterday I had another IKEA bulb (white spectrum) losing connection to deconz. Even turning it off and on again didn't help. Had to restart deconz for it to work again somehow.

I've got a network with mostly IKEA bulbs, a Heiman outlet and quite some Xiaomi sensors.

JBS5 commented 5 years ago

Some specifications of my zigbee network:

Conbee firmware 262F0500 with extension cable on a NUC. deCONZ 2.05.55 in Docker, so the first thing I have to do is upgrade to 2.05.59 I guess.

Powered (24/7)

Quantity Type Firmware
4x Tradfri E27 white 1.1.1.0-5.7.2.0
2x Tradfri E27 white 1.2.214
21x Tradfri GU10 dimmable 1.2.214
3x Osram Smart+ socket 1.04.12

Battery powered

Quantity Type Firmware
3x Hue Dimmer Switch 5.45.1.17846
1x Aqara smart switch 20180525
1x Aqara smart switch 20161128
1x Aqara double wireless switch 20170411
1x TRADFRI remote 1.2.214
6x Aqara multisensor 20161129
10x Aqara contactsensor 20161128
5x Aqara motion sensor 20170627
1x Aqara leak sensor 20170721
1x Aqara vibration sensor 20180130
peer69 commented 5 years ago

Any updates on this for the current version? I have been running .60 for 3 days and no light has lost connection yet.

jurriaan commented 5 years ago

Unfortunately I already had a lost connection with a regular Tradfri E27 white bulb on .60 and the newest firmware.

peer69 commented 5 years ago

That's bad news... if I understand correctly the polling intervals have been changed in .60 to be less aggressive. Aggressive polling causing a light to hang up made perfect sense to me, too bad this doesn't seem to be the solution to our problem.

lrnflk commented 5 years ago

Yesterday I turned off the power for all mains powered devices and updated to 2.05.60 and 26320500 before turning them on again, just to play it safe. The lights then all worked fine for about 24 hrs but just a few minutes ago I noticed that one of my GU10's had stopped responding. Luckily enough it came back to life again some minutes later without any manual interaction from my end so perhaps the network was just clogged.

manup commented 5 years ago

@JBS5 I would recommend to update the 4x Tradfri E27 white at 1.1.1.0-5.7.2.0 to a recent firmware version. If I recall correctly this is still the very first version.

@jurriaan which version has your Tradfri E27 white bulb?

That's bad news... if I understand correctly the polling intervals have been changed in .60 to be less aggressive.

Yep basically very similar to IKEA gateway. Now the only remaining difference is the periodically query of neighbor tables (which is used to display the mesh network lines).

This can be turned off by clicking on the CRE icon in deCONZ and uncheck "Routers and Coordinator". Might be worth a test.

On Reddit there was a post mentioning that the IKEA gateway should in theory support up to 100 devices, but that it isn't testet very well. Would be interesting to know what the usual network size which IKEA team does test.

https://www.reddit.com/r/tradfri/comments/96yiq4/google_home_losing_lamps_and_rooms/e4x1scz/?context=1

You could probably have 100 devices connected to your Gateway. This is not tested properly by us, which is why we don't guarantee it. But the technical limit is 100, and I've seen people who have 100 devices with none or only minor issues.

New versions for the Gateway will support the same amount of devices (Officially 50). You could add another Gateway to your system if you want to double that .

JBS5 commented 5 years ago

@JBS5 I would recommend to update the 4x Tradfri E27 white at 1.1.1.0-5.7.2.0 to a recent firmware version. If I recall correctly this is still the very first version.

Those E27 bulbs with the old firmware didn't fail during the past 6 months while others did...

manup commented 5 years ago

Very interesting, how about the E27 at version 1.2.214?

JBS5 commented 5 years ago

Very interesting, how about the E27 at version 1.2.214?

They lost connection only once in the past months.

It's been a few weeks ago since the last GU10 lost connection, this while I am still using deCONZ 2.05.55 and firmware 262F0500 on the Conbee.

lb1974 commented 5 years ago

I also have this problem. Only Ikea nodes, (43, mainly lights). I have no knowledge about zigbee, but since I have not seen it mentioned: my network seems more stable with OTAU disabled. The other day I also changed network preferences to less secure. Cannot remember which one, but after that I have not lost any lights.

peer69 commented 5 years ago

After some days without any issue several GU10 have become unresponsive. Another issue might be unrelated but an Osram light lost connection as well. Even though it was still shown in the GUI and seemed to mesh I couldn’t control it any more. Had to delete the light an readd it, it was assigned another Light No but regained its former name shortly after adding it. No idea what’s happening here but this is quite a bit more maintenance than I would like to see for my setup.

manup commented 5 years ago

@peer69 did you also try just to power-cycle the light? Normally a factory reset shouldn't be needed. You're on 2.05.60? Can you also provide some more details about your network, how many lights and mains powered devices?

peer69 commented 5 years ago

@manup I powercycled the light. Several times. It was controllable for about 10 seconds after a powercycle but then turned unresponsive again (red lights flashing in GUI). In the meanwhile this issue came back again even after the factory reset and also affected another OSRAM light. For now I have been getting rid of the only two OSRAM lights in my network and replaced them with hue bulbs. I can offer some testing with the ORSAM lights but I would need some time for that.

I am running 2.05.60. Currently there are 57 nodes connected to the network of which 27 devices are mains powered. I use 13 IKEA GU10, 1 IKEA FLOALT Panel, 2 IKEA E14, 3 OSRAM Smart+ Plugs, 3 E14 hue bulbs, 5 E27 hue bulbs, 1 hue lightstrip.

I also had to powercycle some IKEA GU10 in the past days which turned unrepsonsive. After the powercycle everything is back to normal and I dont see a pattern. I have lamps with several GU10 and no more than one GU10 turned unresponsive at the same time despite they are always controlled as a group.

lrnflk commented 5 years ago

At the moment I'm back to square one. Now I regularly have 2-3 lights that don't respond, but it jumps between different lights so sometimes they work and sometimes they don't depending on which other lights that are online. The issue seems to be cascading because when I physically turn off some lights by cutting the power to them, it shifts the issue to other lights.

I've also lost a couple of Xiaomi temperature sensors as well as door sensors and an IKEA on/off switch but they don't pop back in so they probably need to be re-paired in order to start working again.

Things worked fine immediately after a total power cycle as I noted earlier but a few days later the lights started getting unresponsive again and it's been like this for a couple of weeks now. Back when it happened the first time a guest accidentally unplugged the E14 for several hours so I'm not sure if it's completely unrelated or if unexpected disturbances of the zigbee mesh caused the routing to go crazy. Given that I'm apparently not the only one with these issues I think that it may just have been a coincidence.

I really like the concept of having all my zigbee devices in one single mesh but I'm almost at the point where I boot up the old Hue and Xiaomi gateways and put the ConBee in a drawer, which I really don't want to do for several other reasons. Does anyone have any tips for further troubleshooting that could help me identify exactly what's going on and how to resolve it?

manup commented 5 years ago

One of my IKEA GU10 spots is now unresponsive too.

In the sniffer I see it's still somewhat "alive" and sends NWK Link Status messages, but it obviously thinks it is alone in the network (Link Status Count: 0).

image

It doesn't respond to unicast commands but sends the periodically ZCL attribute report of the modelid.

Sniffing since ~2 hours, not sure when it became unresponsive and if it is related but the report ZCL sequence number is low:

image

I'll do some more tests and won't powercycle it for a while.

My Tradri power socket acted really weird today too and did run in reboot circles, never had this one before.

manup commented 5 years ago

Just noted a second IKEA GU10 is also a walking led, same symptoms as the other one.

Both devices don't respond to unicast, groupcast nor ZCP nwk address request (pressing 0). They send empty NWK Link Status commands.

The second GU10 also did send ZCL OTA Query Next Image requests.

image

It seems that the response doesn't come through.

Only a wild guess but I figure the lights in-buffers are blocked and that's why nothing is received and processed. The out-buffers are still working, hence the firmware is able to send reports and ota queries.

It would be good if they implement something like a simple health check so that the firmware can reboot after a while, if mac layer is working (commands are received) but nwk and aps layers stay silent.

MattL0 commented 5 years ago

I also have this problem, even after upgrading to 2.05.59. Today was one of my three outdoor-lights "gone". Its Tradfri bulbs all threee of them. image

off topic. How do you find your light in all this ?lol. I have similar setup and I was like... nooo time for this

MattL0 commented 5 years ago

by similar i mean +- 50 devices

manup commented 5 years ago

+- 50 is kind of ok. One of our test networks has +180 devices with deCONZ on a Raspberry Pi 1, that's fun :)

We have some plans to add better filter/sorting to deCONZ to simplify finding devices, currently it's really cumbersome at a certain device count.

manup commented 5 years ago

The lights are still stuck.

One interesting observation: I powered off the parent of a Philips Hue motion sensor (a Hue Lux) so that it needs to search a new parent. The sensor now tries to rejoin through one of the stuck IKEA GU10 lights.

The light does respond with a Leave (with Rejoin) command. So it did process the rejoin request!

image

Sadly the Hue motion sensor is stubborn and tries to rejoin to the stuck GU10 light forever, instead looking for a better parent.

However the interesting part here is that the stuck IKEA light does respond to the rejoin request, perhaps it also processes NWK Leave requests, that could be a base of a workaround to get the light into a working state again.

manup commented 5 years ago

Correction, Hue motion sensor isn't too stubborn; after a few minutes the sensor selected another working parent (good).