dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.89k stars 496 forks source link

Bug All my Xiaomi Aqara and Mija device are unreachable ! #912

Closed Nobbbi closed 5 years ago

Nobbbi commented 5 years ago

Hi

the Problems startet with some of the Round Xiaomi Buttons after updating to v.43, now with v.44 all Xiaomi devices are gone.

Within the phoscon app, all known xiaomi sensors are unreachable. With VNC i see that only 3 of 8 devices have a connection.

They did all work with v.42

manup commented 5 years ago

How long is .44 running? After deCONZ restart it can take up one hour until Xiaomi sensors will be shown as reachable again (hourly sensor report).

MattL0 commented 5 years ago

yes before let,s say .40 , the sensors where not doing this behavior.

Now i need to take a walk in my house every time i reboot my server or deconz. If i do not do that... and let thing goes.. some sensors do not comeback by themself.

ex: i need to open-close my bathroom door two times before the sensors respond optimally ( without error).

I didn't have this problem before. The sensors were not ''written'' as unreachable after a reboot.

I have a headless install ..so I think i'll be hard to send you any report

manup commented 5 years ago

There was indeed a change which deactivated to directly restore the reachable state of sensors after restart in order to get rid of sensors which weren't reachable for a very long time.

The direct restore will come back soon after I figured out what the main issue is and how to prevent it. For Xiaomi sensors it should be safe to assume they are reachable when they were alive the last two hours.

heitec11 commented 5 years ago

@manup : I am telling you more then one week, that there is a big bug with xiaomi sensors. Now I hope you get enough issues from other users too ...

Nobbbi commented 5 years ago

No change after several hours !

lbschenkel commented 5 years ago

FYI @manup: I have experienced similar behaviour with my IKEA motion sensors, but I'm not sure if it's technically the same problem. Sometimes when restarting deCONZ they'll get 'stuck' in some initial state, sometimes reporting motion, when I know for a fact there has been no motion. They won't get 'unstuck' until I walk in front of the sensor: in some occasions I have deliberately left them alone for >24h and they remain stuck in the 'motion' state. I've only seen this happening on deCONZ restarts, never during 'normal' operation, so I'm skeptical that the sensors are sending spurious signals since I would have noticed it.

In case you believe that's unrelated and would prefer that I create a separate issue, please let me know and I'll do so.

ghost commented 5 years ago

Same issue here with most of my Xiaomia sensors. some report back within an hour - some never come back - surfaced for me with upgrading to .42 - still existing in .44

MattL0 commented 5 years ago

I can't talk for other sensors... because when we walk in our home we activate them...

but this time.. the door sensor ( without closing any door) did come back by themselves.

Htop says that my pc was is up since 1 hour and a half. ( 1H30)

MattL0 commented 5 years ago

capture

Nobbbi commented 5 years ago

For me it does not help do activate the devices, they still unreachable !

christmasjumper commented 5 years ago

Hi,

I'm having issues with Xiaomi sensors as well since updating to .44

They intermittently seem to work and I can't attribute any particular scenarios or changes to this. Last night 2 Xiamoi door sensors didn't work, after going out for the evening and coming back 1 was working again.

I have two RPI's, one where all devices connect directly (there are only sensors attached and no Ikea bulbs) - on this particular hub the sensors do seem to be stable and working constantly. The RPI where I have Ikea bulbs and sensors attached seems to be the one with the issue.

I'm happy to attach logs / more information to help troubleshoot, just need to know what you need.

Is it possible to downgrade back beyond .43? I thought I'd read somewhere there was no backward compatibility....

manup commented 5 years ago

Currently in investigation with ongoing documentation what's happening in the Wiki:

https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/End-device-Polling

I'll test various cases for the battery powered Xiaomi devices I have. The interesting part is when Xiaomi devices are not connected directly to the gateway and parent routers change.

From the motion sensor sniffer logs I see that they are able to recover on their own, but it may take a while. I hope to find out the boundaries and maybe find some ways to improve the recovery.

manup commented 5 years ago

I have two RPI's, one where all devices connect directly (there are only sensors attached and no Ikea bulbs) - on this particular hub the sensors do seem to be stable and working constantly. The RPI where I have Ikea bulbs and sensors attached seems to be the one with the issue.

Ikea lights could indeed be a problem here, when I recall correctly they throw out children of their internal tables after 8 minutes. Which is fine for Ikea end-devices since they poll the parent every 5 minutes. Xiaomi end-devices do sleep up to one hour.

I'll do some more tests to verify this, and also to check if Xiaomi devices do have a chance to recover on their own in this scenario.

The lights might still ACK sensor reports on lowest level to the sensor but not forward the messages to the gateway, in this case a recovery will never happen. Sniffer logs will tell...

lbschenkel commented 5 years ago

@manup: I know little of the particularities of the Zigbee protocol. Is there a way for a coordinator to influence how an end-device picks its parent router?

manup commented 5 years ago

The most reliable way is to power off any router which shouldn't be picked as parent (excluding the gateway).

Currently I would suggest to prefer always powered lights/smart plugs, since once Xiaomi devices are connected to a parent they really hard try to stick to it before recovery will be done - which might also fail (Ikea?).

What routers are good?

lbschenkel commented 5 years ago

Right, but I was asking more out of curiosity. Is it completely up to the device which parent it picks or does the Zigbee spec make any allowance for the coordinator to influence this?

manup commented 5 years ago

In theory at the setup process the coordinator could also chose which routers are allowed to let other devices join (become parent).

In practice:

lbschenkel commented 5 years ago

Also once a device has joined the network it could decide to pick another parent.

I suppose this happens completely independently, without a request to, or interaction with, the coordinator/gateway?

I presume there's little you can do to solve these interoperability issues at the deCONZ side apart from documenting different devices' behaviour so users can try to mitigate this themselves?

manup commented 5 years ago

I suppose this happens completely independently, without a request to, or interaction with, the coordinator/gateway?

The coordinator could interfere but currently it just automatically allows it.

I presume there's little you can do to solve these interoperability issues at the deCONZ side apart from documenting different devices' behaviour so users can try to mitigate this themselves?

Today I found some issues in Ikea lights to update internal nwkUpdateId (raised after channel changes) which could prevent Ikea lights to be selected as parents. This will be fixed via a workaround from gateway side.

However testing the Xiaomi motion sensor shows that it's not reliably able to recover from parent loss. My motion sensor is stubbornly trying to send the presence signal through the Philips hue bulb parent which I powered off three hours ago. Here the gateway can't do anything.

https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/End-device-Polling#xiaomi-motion-sensor-rtcgq11lm

I think to improve the whole situation the setup process must be restricted to force Xiaomi sensors to select a parent which is meant to be powered all the time.

Maybe this can be done automatically since the gateway can observe which routers are always powered and which ones are bad candidates. And instead to broadcast permit join only good candidates would be allowed to be a parent during setup.

lbschenkel commented 5 years ago

I suppose this happens completely independently, without a request to, or interaction with, the coordinator/gateway?

The coordinator could interfere but currently it just automatically allows it.

Just to be 100% clear: are you referring to when the device is put into setup/join mode and the network is manually opened, or when the device decides to elect a new parent when the previous one became unreachable? Maybe my question wasn't clear enough, but it was about the latter scenario. If the gateway can influence, does it mean that a device broadcasts its intent of picking a new parent and it's the gateway that says to potential routers: "yes, you can adopt this device"?

P.S.: Apologies if I'm asking too many questions, but it's due to technical curiosity and also because I'm an occasional contributor, so any clarifications help me in constructing a better mental model and helpfully offering higher quality contributions to this project.

heitec11 commented 5 years ago

After using the image and manuell upgrad to Version 2.5.39 (Raspi) and Firmware 0x26240500 (Raspbee) I could pair my Xiaomi Sensors. Now 4 of my 5 sensors are working.

manup commented 5 years ago

Just to be 100% clear: are you referring to when the device is put into setup/join mode and the network is manually opened, or when the device decides to elect a new parent when the previous one became unreachable? Maybe my question wasn't clear enough, but it was about the latter scenario. If the gateway can influence, does it mean that a device broadcasts its intent of picking a new parent and it's the gateway that says to potential routers: "yes, you can adopt this device"?

It works with unicast like this:

image

So the gateway could decide to not send the APS Tunnel Command and prevent the node from joining via a specific router.

P.S.: Apologies if I'm asking too many questions, but it's due to technical curiosity and also because I'm an occasional contributor, so any clarifications help me in constructing a better mental model and helpfully offering higher quality contributions to this project.

Curiosity is good :) ZigBee is kind of complex as whole, but for specific question I recommend to have a look in the specification(s) which mostly provide a understandable description of the various functionalities.

manup commented 5 years ago

After using the image and manuell upgrad to Version 2.5.39 (Raspi) and Firmware 0x26240500 (Raspbee) I could pair my Xiaomi Sensors. Now 4 of my 5 sensors are working.

Is it a sensor only network?

heitec11 commented 5 years ago

Yes, it is a sensor only network (at the moment).

ghost commented 5 years ago

@manup is this issue moving anywhere .. do you know where the problem is and when will it be fixed ?

the connection to many of my Xiaomi sensors is dropping repeatedly, some never came back, some I was able to re-pair and they reappeared - until I rebooted - then they were gone again ?

deconz in this state has become unusable for me. what needs to be done to fix that ?

manup commented 5 years ago

@manup is this issue moving anywhere .. do you know where the problem is and when will it be fixed ?

There are several issues which are under investigation. Since a while mixed device setups are growing like crazy, it's no longer mainly Philips hue networks but also the wild bunch Ikea, Osram, Innr, Xiaomi joined the party, each vendor with various firmware versions and surprises, some of them don't route nicely or have subtle bugs wich might take other devices down (Ikea/Xiomi nwkUpdateId Bug).

Changes in the deCONZ and it's firmware are made to improve the situation but can of course also bring other problems to light or contain bugs (.42 – .44) my test networks are running very stable which doesn't help to find these bugs early, so often I need to understand bugs first in order to recreate them.

And sadly it's not easy like fix just one thing and everything works. Currently the most hard part is to detect differences in devices and implementations to find common ways to handle various tasks for different vendors.

Some of the findings are documented here:

https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/End-device-Polling

For Xiaomi end-devices it is very easy to bring them down:

.45 does contain the first fixes for Ikea/Xiaomi nwkUpdate related issues.

.46 will contain fixed related to invalid device data in the database and caches which caused disappearing nodes/lights.

.47 will have a small but important firmware change to keep links/routes in larger networks with more than 22 router devices.

It's a rocky road but yes getting to a stable solution has the highest priority right now, until this is achieved the beta should only be updated when knowing about the risks and also it's recommended to make backups before each update. deCONZ does track changes of network configuration in the zll.db file this data will be used in future versions to recover from some further network configuration issues automatically, which currently needs to be done by hand.

ghost commented 5 years ago

thanks @manup for the detail, much appreciated.

How can one stay updated on your progress and the roadmap ahead without opening or asking in issues ?

christmasjumper commented 5 years ago

Just an update from me - 2.05.45 seems to have resolved my issues with sensor connectivity stability. I've been running for a week now and had very few/next to no drop outs.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.