dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.89k stars 496 forks source link

deConz loses contact with my lights #826

Closed helgemor closed 5 years ago

helgemor commented 5 years ago

Hi

I have a conbee zigbee usb controller with up-to-date firmware, and several Philips Hue lights. I'm using Home Assistant 0.79 on a Debian Intel NUC. I have installed the deConz addon and PhosCon App in HA. After I added one light it seemed to work fine. But some time later, e.g. next day it had disappeared! I have tried several times, with the same results. One funny thing is that today, even if I can’t see the light in PhosConn, it responded to the “All Off” switch.

Thanks, Helge

olemr commented 5 years ago

See #765 & #801 . Ikea issues, but could be releated. I have received an extra ConBee and will see if I can sniff out something interesting. I have my Ikea lights distributed 50/50 on ConBee and Ikea Gateway so I have something to compare with.

helgemor commented 5 years ago

I got a tip saying it could be a conflict with WiFi channels. I have eliminated that by setting the 2.4GHz channels in my house to 6 and 9. The zigbee channel is 25, outside any interference zone. The problem hasn't gone away.

olemr commented 5 years ago

This is not my problem either. WiFi on ch13, ConBee on 25. I have received my sniffer ConBee and has set up WireShark. Need a trigger for when a light becomes unreachable. Will do that next. Hopefully the logs will provide some insight.

ebaauw commented 5 years ago

Need a trigger for when a light becomes unreachable.

There's no such thing. Note that state.reachable doesn't reflect whether the light is reachable, only whether deCONZ could reach it the last time it tried to.

olemr commented 5 years ago

But there is a 'reachable' field in the REST API that goes from true to false when the lights are dropped. I'm already polling all the /lights items every 2s and this trigger will provide approximate time and the 'uniqueid' of the light that becomes unreachable, ie "reachable": false.

I can see the 'uniqueid' in WireShark: image

Even though deconz sometimes sees the light again, the time it is truly lost it does not.

ebaauw commented 5 years ago

I'm already polling all the /lights items every 2s and this trigger will provide approximate time and the 'uniqueid' of the light that becomes unreachable

No it won't. The REST API call only returns the state deCONZ has cached - it doesn't actually communicate with the lights. deCONZ polls the lights independently from the REST API calls - you can see this in Wireshark.

EDIT Try powering off a light: depending on the number of nodes in your network, it might take a few seconds to over a minute, before the REST API reports state.reachable as false.

I can see the 'uniqueid' in WireShark:

Yes, the uniqueid is the ZigBee mac address followed by the endpoint (and by the cluster for /sensors resources).

There's no way around this through the REST API. The only way to force communication with the lights is through the deCONZ GUI.

olemr commented 5 years ago

There's no way around this through the REST API. The only way to force communication with the lights is through the deCONZ GUI.

I see that, but my thought was to get at least a time frame of minutes for looking in the WireShark logs instead of going in completely blind.

EDIT: This is my first 'sniffing' and I started out sniffing both on my Ikea Gateway network and the ConBee network, and just looking at the live trace they look quite different. ConBee has loads more ACKs, but that could be due to my 3 Mi Cubes and the HUE-1 remote?

Paul-dH commented 5 years ago

I have the exact same issue, I added a note to: https://github.com/dresden-elektronik/deconz-rest-plugin/issues/522#issuecomment-427685918

Please let me know if I can help to find the issue :)

olemr commented 5 years ago

I have my 'trigger' almost working, but been too busy IRL this weekend. Will continue my sniffing through the week. I see 3 cases of "reachable": false

1 - it is only intermittent. It becomes "reachable": true after a while. Unsure if it responds to light commands in this state. 2 - "reachable": false is in steady state. Light commands does not work, but group commands do. 3 - "reachable": false is in steady state. Light does not respond to any commands. It could be stuck in off or on state. Has to be power cycled to get online again.

I'm still running ~10 Ikea devices (Bulbs and Panels) on the Ikea Gateway. Granted, I do not monitor them as closely, it has been over 6 months since any of the devices on that net froze. It could also be due to a SW update in either the Ikea Gateway or any of the lights. I have checked all FW versions on deconz, and they all match the ones on the Ikea Gateway.

Now, why bother with Conbee/deconz you might ask? Here are my reasons:

1 - Since I'm using the mi cubes on deconz, I thought it nice to have all devices on one net. 2 - The Ikea gateway did have issues in the early days, but after changing the USB power and the Eth patch cable that came with it, it has been quite stable lately. 3 - Interfacing through the Ikea Gateway on openHAB, group support is a bit flaky. It works, but if you turn on or off a group of 10 GU10 lights, it can take up to 5s before they all settle. Deconz does it simultaneously. 4 - I like the fact that deconz interfaces directly to the Ikea 5-button remote, and the fact that it can associate on/off and dimming without deconz running. Crucial for WAF id system is down. The only thing lacking in this department is the association of color temperature control. 5 - since the Conbee is already in use, removing a Gateway is one less point of failure.

@ebaauw I see that some Zigbee Gateways state a maximum net size of 20 devices. I have 37 now. Could that cause any problems? Do you know what the max net size for Conbee/deconz is?

ebaauw commented 5 years ago

I see 3 cases of "reachable": false

  1. Typically a hiccup in the network, where deCONZ misses some responses when polling the light. Usually, the lights respond, but deCONZ (RaspBee/ConBee) send queues might fill up waiting for ACKs from the light, causing requests to be delayed or cancelled.
  2. Typically a routing issue, where the RaspBee/ConBee no longer knows how to reach the light. See all the meshing topics. You can mitigate this by using group and scene commands from your rules.
  3. Typically a bug in the light firmware. The IKEA colour bulb hangs on setting some colours, notably when Y is zero.

Then there’s the obvious case when the light is cut from power, typically using a traditional 20th century wall switch.

I see that some Zigbee Gateways state a maximum net size of 20 devices. I have 37 now.

I’m on 91 currently. deCONZ supports up to 200 nodes. The larger the network, the longer it takes for each light to be polled by the gateway. This is especially a problem for ZLL lights (Philips) that don’t support attribute reporting. That’s why the Hue bridge only supports 50 lights (although technically it could handle 63).

Paul-dH commented 5 years ago

Hi @olemr and @ebaauw,

I did a couple of tests against your list of options and added my findings :)

1 - it is only intermittent. It becomes "reachable": true after a while. Unsure if it responds to light commands in this state. --> Only all lights are unavailabe, all switches and dimmers are online. I led it run for 2 days/nights and noting changed. I also moved the whole setup to another place, this didn't help either.

2 - "reachable": false is in steady state. Light commands does not work, but group commands do. --> I'm not able to test this, tried to add a new group but I can't add unavailable lights to the group. Have to say that this should be expected behaviour, adding a unavailable light seems strange...

3 - "reachable": false is in steady state. Light does not respond to any commands. It could be stuck in off or on state. Has to be power cycled to get online again. --> Ive powercycled every lamp, sadly none of them came online in a time window of 4-5 hours.

The weird thing is that when I connect the ConBee to my Windows laptop and start Deconz there, all my lights and settings are back and everything works. It looks like a issue with the version of Deconz used in the Home Assistant Hassio Docker...

The version of my laptop is 2.05.20 and of the container 2.05.39, the release notes of the latest version do show a lot of changes made on ConBee connection issues.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.