dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.89k stars 498 forks source link

no permit join after 20 min #5229

Closed thecem closed 3 years ago

thecem commented 3 years ago

Describe the bug

After a reboot (power cycle) the last FW of conbee ii 26720700 lets only a short period of time devices to join. FW 26670700 is the last working, with this one permit join is all the time possible.

Steps to reproduce the behavior

Install latest FW 26720700 wait 1h, afterwards try to add a sensor (Xiaomi I tried door sensor and old ir sensor) -> not possible Power cycle your device (regardless you have a inuc or a rpi3b), try directly to add a sensor -> possible

Expected behavior

Add Sensor should work every time regardless the uptime.

Screenshots

Environment

I tried this with rpi3b/buster and an inuc/win10/vm/HA/docker same behavior. Only Xiaomi sensors and switches in network.

Working FW: 26670700 Not working FW: 26680700 and 26720700

deCONZ Logs

Additional context

manup commented 3 years ago

Is it possible that you have the Phoscon App open in multiple tabs / browsers, perhaps devices? Currently there is the still oddity that multiple instances of the Phoscon App will fight over to open/close the network.

Smanar commented 3 years ago

Can you see in log the line about the permit join ? and with the value ? (with "info")

19:10:49:993 send permit join, duration: 59

thecem commented 3 years ago

thanks for helping here!

Yes I see the permit: 21:42:59:147 send permit join, duration: 59

Only one instance is open, one browser and one phoscon web.

The first 20 - 40 min after reboot the permit is working.

thecem commented 3 years ago

Right now I try to WSDCGQ11LM to join, but this only works after a power cycle.

With FW 26670700 new device could join even after weeks of uptime....

SwoopX commented 3 years ago

I just flashed to 26720700 and now let it run for an hour, then we'll see. Also got some extended debug output to check if network is open or not.

thecem commented 3 years ago

If I could help out with any kind of locks pls let me know.

SwoopX commented 3 years ago

So I checked it out. Tried to join a Xiaomi temp/hum sensor and it went through nicely, as expected. Network was open for 3 mins as it should, also confirmed by the sniffer.

Therefore, I conclude that whatever you experience is not related to the firmware.

You could maybe share a deconz debug log over the 3 mins you try it when it doesn't work. Maybe there's other indications on where things go wrong...

thecem commented 3 years ago

which log settings do you need?

thecem commented 3 years ago

what for a system do you using?

thecem commented 3 years ago

log level 1 (all): clicking new sensor -> waiting for permit join -> resetting the sensor:

15:29:38:876 send permit join, duration: 59
15:29:40:469 Mgmt_Lqi_req zdpSeq: 7 to 0x00212EFFFF024885 start index 2
15:29:40:486 ZDP status = 0x00 -> SUCCESS
15:29:40:486 ZDP Mgmt_Lqi_rsp zdpSeq: 7 from 0x00212EFFFF024885 total: 25, startIndex: 2, listCount: 1
15:29:40:487     * neighbor: 0x00158D000204C809 (0xD878), LQI: 252, relation: 0x01 rxOnWHenIdle: 0
15:29:44:309 Mgmt_Lqi_req zdpSeq: 8 to 0x00212EFFFF024885 start index 3
15:29:44:342 ZDP status = 0x00 -> SUCCESS
15:29:44:342 ZDP Mgmt_Lqi_rsp zdpSeq: 8 from 0x00212EFFFF024885 total: 25, startIndex: 3, listCount: 1
15:29:44:342     * neighbor: 0x00158D00023E8CD5 (0x3843), LQI: 255, relation: 0x01 rxOnWHenIdle: 0
15:29:46:431 ZCL attribute report 0x00158D00053DFA33 for cluster: 0x0402, ep: 0x01, frame control: 0x18, mfcode: 0x0000 
15:29:46:446 ZCL attribute report 0x00158D00053DFA33 for cluster: 0x0405, ep: 0x01, frame control: 0x18, mfcode: 0x0000 
15:29:46:448 ZCL attribute report 0x00158D00053DFA33 for cluster: 0x0403, ep: 0x01, frame control: 0x18, mfcode: 0x0000 
15:29:48:149 Mgmt_Lqi_req zdpSeq: 9 to 0x00212EFFFF024885 start index 4
15:29:48:169 ZDP status = 0x00 -> SUCCESS
15:29:48:171 ZDP Mgmt_Lqi_rsp zdpSeq: 9 from 0x00212EFFFF024885 total: 25, startIndex: 4, listCount: 1
15:29:48:171     * neighbor: 0x00158D0002049FAE (0xC501), LQI: 249, relation: 0x01 rxOnWHenIdle: 0
15:29:51:989 Mgmt_Lqi_req zdpSeq: 10 to 0x00212EFFFF024885 start index 5
15:29:52:020 ZDP status = 0x00 -> SUCCESS
15:29:52:020 ZDP Mgmt_Lqi_rsp zdpSeq: 10 from 0x00212EFFFF024885 total: 25, startIndex: 5, listCount: 1
15:29:52:020     * neighbor: 0x00158D0002BFA579 (0x4FE2), LQI: 255, relation: 0x01 rxOnWHenIdle: 0
15:29:55:830 Mgmt_Lqi_req zdpSeq: 11 to 0x00212EFFFF024885 start index 6
15:29:55:865 ZDP status = 0x00 -> SUCCESS
15:29:55:865 ZDP Mgmt_Lqi_rsp zdpSeq: 11 from 0x00212EFFFF024885 total: 25, startIndex: 6, listCount: 1
15:29:55:866     * neighbor: 0x00158D00023947AB (0x3E5E), LQI: 255, relation: 0x01 rxOnWHenIdle: 0
15:29:59:669 Mgmt_Lqi_req zdpSeq: 12 to 0x00212EFFFF024885 start index 7
15:29:59:691 ZDP status = 0x00 -> SUCCESS
15:29:59:691 ZDP Mgmt_Lqi_rsp zdpSeq: 12 from 0x00212EFFFF024885 total: 25, startIndex: 7, listCount: 1
15:29:59:691     * neighbor: 0x00158D0001BB7161 (0xB138), LQI: 255, relation: 0x01 rxOnWHenIdle: 0
15:30:03:510 Mgmt_Lqi_req zdpSeq: 13 to 0x00212EFFFF024885 start index 8
15:30:03:545 ZDP status = 0x00 -> SUCCESS
15:30:03:545 ZDP Mgmt_Lqi_rsp zdpSeq: 13 from 0x00212EFFFF024885 total: 25, startIndex: 8, listCount: 1
15:30:03:545     * neighbor: 0x00158D00022C752D (0x1374), LQI: 255, relation: 0x01 rxOnWHenIdle: 0
15:30:04:425 Device TTL 2703 s flags: 0x7
thecem commented 3 years ago

after power cycle, adding possible:

15:42:09:644 send permit join, duration: 59
15:42:09:662 ZDP status = 0x00 -> SUCCESS
15:42:09:662 ZDP Mgmt_Lqi_rsp zdpSeq: 10 from 0x7CB03EAA00ACE629 total: 21, startIndex: 9, listCount: 3
15:42:09:662     * neighbor: 0x00158D000245F23B (0xC057), LQI: 50, relation: 0x04 rxOnWHenIdle: 0
15:42:09:662     * previous child: 0x00158D000245F23B
15:42:09:662     * neighbor: 0x00158D0001F9F13F (0x7E92), LQI: 70, relation: 0x04 rxOnWHenIdle: 0
15:42:09:662     * previous child: 0x00158D0001F9F13F
15:42:09:662     * neighbor: 0x00158D0001BB7115 (0x0AB2), LQI: 28, relation: 0x01 rxOnWHenIdle: 0
15:42:10:067 skip device discovery while end devices is added
15:42:10:549 skip device discovery while end devices is added
15:42:11:028 skip device discovery while end devices is added
15:42:11:512 skip device discovery while end devices is added
15:42:11:988 skip device discovery while end devices is added
15:42:12:468 skip device discovery while end devices is added
15:42:12:948 skip device discovery while end devices is added
15:42:13:429 skip device discovery while end devices is added
15:42:13:908 skip device discovery while end devices is added
15:42:14:390 skip device discovery while end devices is added
15:42:14:872 skip device discovery while end devices is added
15:42:15:348 skip device discovery while end devices is added
15:42:15:828 skip device discovery while end devices is added
15:42:16:308 skip device discovery while end devices is added
15:42:16:789 skip device discovery while end devices is added
15:42:17:269 skip device discovery while end devices is added
15:42:17:747 skip device discovery while end devices is added
15:42:18:228 skip device discovery while end devices is added
15:42:18:708 skip device discovery while end devices is added
15:42:19:188 skip device discovery while end devices is added
15:42:19:668 skip device discovery while end devices is added
15:42:20:149 skip device discovery while end devices is added
15:42:20:569 MAC Poll 0x02 0xAB59, life time: 6 sec, dev timeout: 61440 sec
15:42:20:628 unknown node 0x00158D000464D992 (0xAB59), lqi: 255
15:42:20:629 APS-DATA.indication from unknown node 0x00158D000464D992
15:42:20:629 ZDP device announce: 0x00158D000464D992, 0xAB59, 0x80
15:42:20:631 new node - ext: 0x00158d000464d992, nwk: 0xAB59
15:42:20:631 device announce 0x00158D000464D992 (0xAB59) mac capabilities 0x80
15:42:20:632 set fast probe address to 0x00158D000464D992 (0xAB59)
15:42:20:632 FP indication 0x0000 / 0x0013 (0x00158D000464D992 / 0xAB59)
15:42:20:632                       ...     (0x00158D000464D992 / 0xAB59)
15:42:20:632 device announce 0x00158D000464D992 (0xAB59) mac capabilities 0x80
15:42:20:633 skip device discovery while end devices is added
15:42:20:680 FP indication 0x0104 / 0x0000 (0x00158D000464D992 / 0xAB59)
15:42:20:680                       ...     (0x00158D000464D992 / 0xAB59)
15:42:20:680 ZCL attribute report 0x00158D000464D992 for cluster: 0x0000, ep: 0x01, frame control: 0x18, mfcode: 0x0000 
15:42:20:681 0x00158D000464D992 skip Xiaomi attribute 0x0000
15:42:21:108 skip device discovery while end devices is added
15:42:21:530 FP indication 0x0104 / 0x0000 (0x00158D000464D992 / 0xAB59)
15:42:21:531                       ...     (0x00158D000464D992 / 0xAB59)
15:42:21:531 ZCL attribute report 0x00158D000464D992 for cluster: 0x0000, ep: 0x01, frame control: 0x18, mfcode: 0x0000 
15:42:21:531 0x00158D000464D992 extract Xiaomi special attribute 0xFF01
15:42:21:531    01 battery 3055 (0x0BEF)
15:42:21:531    04 unknown 424 (0x01A8)
15:42:21:531    05 RSSI dB (?) 43 (0x002B)
15:42:21:531    06 LQI (?) 4294967296 (0x0100000000)
15:42:21:531    64 temperature 2772
15:42:21:531    65 humidity 5197
15:42:21:531    66 pressure 100221 (1002)
15:42:21:531    0a Parent NWK 0 (0x0000)
15:42:21:544 [1] get node descriptor for 0x00158d000464d992
15:42:21:588 skip device discovery while end devices is added
15:42:22:025 MAC Poll 0x02 0xAB59, life time: 60000 sec, dev timeout: 61440 sec
15:42:22:068 skip device discovery while end devices is added
15:42:22:228 MAC Poll 0x02 0xAB59, life time: 60000 sec, dev timeout: 61440 sec
15:42:22:231 ZDP status = 0x00 -> SUCCESS
15:42:22:231 ZDP Node_Descriptor_rsp 0x00158D000464D992 - 0xAB59
15:42:22:246 FP indication 0x0000 / 0x8002 (0x00158D000464D992 / 0xAB59)
15:42:22:246                       ...     (0x00158D000464D992 / 0xAB59)
15:42:22:246 ZDP indication search sensors 0x00158D000464D992 (0xAB59) cluster 0x8002
15:42:22:247 ZDP indication search sensors 0x00158D000464D992 (0xAB59) clear timeout on cluster 0x8002
15:42:22:252 [2] get active endpoints for 0x00158d000464d992
15:42:22:548 skip device discovery while end devices is added
15:42:23:028 skip device discovery while end devices is added
15:42:23:513 skip device discovery while end devices is added
thecem commented 3 years ago

so the question is why the stick or the sw don't let join devices after a while. With 26670700 the problem don't exists.

thecem commented 3 years ago

in the first log (with same log level) we do not see the "unknown node".

SwoopX commented 3 years ago

The question is more: what's the main difference from a deconz perspective between your environment and mine...

Could you please provide longer log snippets? Please make them 5 mins each to be able to draw some reasonable conclusions... I'd also like to see the APS layer included, so I recommend the following log levels: INFO, INFO_L2, ERROR, ERROR_L2, APS, APS_L2, ZDP

The most remarkable difference I can currently spot (apart from the fact that there's no communication from the sensor you try to pair in the first snipped) is that in the first snipped there's no communication from the router showing up in the 2nd snippet.

How does your network look like? How many battery and mains powered devices do you have?

I currently run my dev environment on 2.12.05-beta with Conbee II on 26720700, Virtualized Ubuntu Focal Fossa LTS on an Intel NUC

thecem commented 3 years ago

One repeater (osram plug) and 52 xiaomi sensors (battery)

Did three joins without succsess, same xiaomi temp/hum/press sensor (deleted before)

log.txt

Mimiix commented 3 years ago

I think you just need more routers.

SwoopX commented 3 years ago

Definitively!

Your plug handles 21 devices, your coordinator 25 ->46!!!

When the coordinator/deconz is restarted, I assume there's a certain time window where you can squeeze a new device in as the routing table hasn't reached its limit yet. This is not exactly what I'd call a resilient and load balanced network.

As such, there's nothing fruther to dig into from my perspective, therefore closing.

thecem commented 3 years ago

How many devices could the conbee II handle without a repeater?

thecem commented 3 years ago

Which repeater do you recommend?

SwoopX commented 3 years ago

How many devices could the conbee II handle without a repeater?

Obvisouly 24, although that's not recommended.

Which repeater do you recommend?

I, personally, nothing from China and no Osram/Ledvance

Smanar commented 3 years ago

Obvisouly 24, although that's not recommended.

Was not 32 ? It was decreased ?

Support up to 32 directly connected end-devices (formerly 10)

thecem commented 3 years ago

Even without the the osram repeater all sensors (47) was rock solid in the network. And with 26670700 no problems to add devices even after a while. But this explanation could be the solution, I will try to add 3 hue e27 bulbs (latest fw) to see if the device add is possible again.

ebaauw commented 3 years ago

Was not 32 ? It was decreased ?

I think they reduced the number of end-devices supported by the coordinator as children in later firmware releases. This is badly documented, maybe we need to tickle @manup to mention this in the firmware changelog.

Even without the the osram repeater all sensors (47) was rock solid in the network.

Once in the network, sensors might continue to function without an actual parent router. They simply send their messages to their assumed (former) parent router (or coordinator), which will happily process them. However, the sensor is unreachable for any message from the gateway, as no parent will hold them. As long as you don't want to change any config setting, you wouldn't notice. Of course, sensors that expect an ACK after sending their messages won't see any, and might decide to give up or even leave the network.

When the coordinator/deconz is restarted, I assume there's a certain time window where you can squeeze a new device in as the routing table hasn't reached its limit yet.

Afaik, child devices are handled in firmware, so restarting deCONZ wouldn't make a difference; only power-cycling the device. I suppose, it could clear its list of children on power-cycle, and rebuild it as it sees messages from the sensors. The routing table is something else, but probably also cleared on power-cycle. I would expect to see rejoin requests by the sensors to find a new parent when the previous parent doesn't respond to a poll request. Then again, both end-device as well as coordinator firmware seem a bit flaky in this function. I have doubts how well this is described in the standard. I suspect this is why my Eurotronic Spirit TRVs still occasionally become unreachable. Hoping the 0x26720700 firmware improves on this.

SwoopX commented 3 years ago

Yeah, I indeed meant the restart in the sense of rebooting the system, as it was mentioned in the description.

With regard to the rejoin requests, that's one of the major caveats Xiaomi devices have (as they send none). All in all, I wouldn't be surprised if devices pass out more often. On the other hand, Xiaomi's are mainly deep sleepers and that's presumably the main reason why this setup has worked out for now. Compared to other devices and brands, there should be only few messages.

Anyway, I guess you're pushing it far beyond the intended limit and your network should benefit from additional routers.