Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.31k stars 1.62k forks source link

Joining new devices fails after a few days of uptime #3177

Closed sjorge closed 3 years ago

sjorge commented 4 years ago

Bug Report

What happened

I tried to join a new device, it failed.

The network/coordinator usb has been up for a about 1-1.5days now, joining new devices fail. Physically unplugging and replugging the usb (with z2m stopped ofcourse) re allows joining of new devices.

I did a capture of one of those failed join requests: https://pkg.blackdot.be/extras/zigbee/failed_join.pcapng

Not sure where it fails afterwards, it looks like z2m doesn't respond to the update device at all?

What did you expect to happen

The device to be able to join.

How to reproduce it (minimal and precise)

I can always reproduce this after a about 1-3 days after plugging in/out the USB device (rebooting the server does not help)

Debug Info

zigbee2mqtt version: 1.12.0-dev (commit #d52d520) CC253X firmware version: {"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}}

sjorge commented 4 years ago

So good news, the slowness doesn't seem to have anything todo with the new firmware. That seems to be because the trådfri repeater again. sfjhaskhf

sjorge commented 4 years ago

@Koenkk bad news :(

I tried to join a device this morning and nothing again. After a usb reset, it worked fine.

After about ~30min (I wasn't watching the log) the ikea bulb behind the repeater came back online... As soon as I had reset the USB the repeater started responding to pings again. And shortly thereafter the bulb did a deviceannounce and reconfigure of reporting!

I did not have any issues with these last week, but I did reset the usb stick every few days testing stuff... so it was never a 24 period where joining was broken. I suspect the issue with the trådfri stuff is a result of the coordinator issues. (I had already tried without any trådfri devices a few weeks ago, and the coordinator would still refuse device joins too.)

sjorge commented 4 years ago

I will try flashing the source routing firmware you posted in tonight or tomorrow depending on when i have time to flash the stick. (Not that I expect it will make a difference, but it's worth a shot)

sjorge commented 4 years ago

z2m does not start with the source routing firmware: https://gist.github.com/sjorge/df385a8cc8741b912dc4be9d53c127b7

When z2m starts the green led turns off, I verify the firmware with SmartRF and it was OK.

Edit: same result with unpatched source routing firmware : CC2531_SOURCE_ROUTING_20190619

sjorge commented 4 years ago

Looks like the C2531 is now completely dead. I tried:

All result in the same error.

sjorge commented 4 years ago

Fixed by adding rtscts: false to my configuration.yaml visible confused

I am now running the patched firmware, I couldn't differentiate based on the firmware info. So I unplugged it and verified the CC2531 against both bins and the source routing one matched.

Summerized:

sjorge commented 4 years ago

Sofar the patched source routiing firmware is working well... the network is much more responsive than before,... it certainly does not have the 'slowness' I experienced at times on the default one. (I have also not had the random ping failures a few times a day)

Lets see if I can still pair stuff a fter afew more days.

sjorge commented 4 years ago

Some further observations:

digraph G {
node[shape=record];
  "0x00124b001938a7e5" [style="bold, filled", fillcolor="#e04e5d", fontcolor="#ffffff", label="{0x00124b001938a7e5|0x00124b001938a7e5 (0)|2020-04-18T11:18:56+00:00}"];
  "0x00124b001938a7e5" -> "0x000d6ffffe8e8d4f" [penwidth=0.5, weight=0, color="#994444", label="91"]
  "0x00124b001938a7e5" -> "0x14b457fffe2bd760" [penwidth=0.5, weight=0, color="#994444", label="32"]
  "0x000d6ffffe8e8d4f" [style="rounded, filled", fillcolor="#4ea3e0", fontcolor="#ffffff", label="{masterbedroom/repeater|0x000d6ffffe8e8d4f (46726)|IKEA TRADFRI signal repeater (E1746)|2020-04-18T11:17:23+00:00}"];
  "0x000d6ffffe8e8d4f" -> "0x00124b001938a7e5" [penwidth=2, weight=1, color="#009900", label="46 (routes: 31787,49490)"]
  "0x000d6ffffe8e8d4f" -> "0x90fd9ffffee77fcf" [penwidth=0.5, weight=0, color="#994444", label="48"]
  "0x000d6ffffe8e8d4f" -> "0x14b457fffe2bd760" [penwidth=0.5, weight=0, color="#994444", label="119"]
  "0x90fd9ffffee77fcf" [style="rounded, filled", fillcolor="#4ea3e0", fontcolor="#ffffff", label="{bedroom/bed_lamp/bulb|0x90fd9ffffee77fcf (31787)|IKEA TRADFRI LED bulb E14/E26/E27 600 lumen, dimmable, color, opal white (LED1624G9)|2020-04-18T11:17:05+00:00}"];
  "0x90fd9ffffee77fcf" -> "0x00124b001938a7e5" [penwidth=0.5, weight=0, color="#994444", label="0"]
  "0x90fd9ffffee77fcf" -> "0x000d6ffffe8e8d4f" [penwidth=0.5, weight=0, color="#994444", label="59"]
  "0x90fd9ffffee77fcf" -> "0x14b457fffe2bd760" [penwidth=2, weight=1, color="#009900", label="123 (routes: 31787)"]
  "0x14b457fffe2bd760" [style="rounded, filled", fillcolor="#4ea3e0", fontcolor="#ffffff", label="{bedroom/desk_lamp/bulb|0x14b457fffe2bd760 (39924)|Innr E14 bulb RGBW (RB 250 C)|2020-04-18T11:17:29+00:00}"];
  "0x14b457fffe2bd760" -> "0x00124b001938a7e5" [penwidth=0.5, weight=0, color="#994444", label="1"]
  "0x14b457fffe2bd760" -> "0x000d6ffffe8e8d4f" [penwidth=2, weight=1, color="#009900", label="98 (routes: 39924,31787)"]
  "0x14b457fffe2bd760" -> "0x90fd9ffffee77fcf" [penwidth=0.5, weight=0, color="#994444", label="91"]
  "0x14b457fffecc1315" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/bed_lamp/remote|0x14b457fffecc1315 (44663)|IKEA TRADFRI ON/OFF switch (E1743)|2020-04-18T04:33:23+00:00}"];
  "0x14b457fffecc1315" -> "0x90fd9ffffee77fcf" [penwidth=1, weight=0, color="#994444", label="255"]
  "0x14b457fffeca351b" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/desk_lamp/remote|0x14b457fffeca351b (24931)|IKEA TRADFRI ON/OFF switch (E1743)|2020-04-18T11:15:29+00:00}"];
  "0x14b457fffeca351b" -> "0x90fd9ffffee77fcf" [penwidth=1, weight=0, color="#994444", label="255"]
  "0x00158d0004148f89" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/motion|0x00158d0004148f89 (12102)|Xiaomi Aqara human body movement and illuminance sensor (RTCGQ11LM)|2020-04-18T11:14:36+00:00}"];
  "0x00158d0004148f89" -> "0x000d6ffffe8e8d4f" [penwidth=1, weight=0, color="#994444", label="134"]
  "0x00158d00033ddfaa" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/sensor|0x00158d00033ddfaa (2897)|Xiaomi Aqara temperature, humidity and pressure sensor (WSDCGQ11LM)|2020-04-18T11:16:34+00:00}"];
  "0x00158d00033ddfaa" -> "0x000d6ffffe8e8d4f" [penwidth=1, weight=0, color="#994444", label="184"]
  "0x00158d0003f115c5" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{serverroom/sensor|0x00158d0003f115c5 (23324)|Xiaomi Aqara temperature, humidity and pressure sensor (WSDCGQ11LM)|2020-04-18T11:16:21+00:00}"];
  "0x00158d0003f115c5" -> "0x00124b001938a7e5" [penwidth=1, weight=0, color="#994444", label="140"]
  "0x00158d0001ffaffc" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/radiator|0x00158d0001ffaffc (49490)|Eurotronic Spirit Zigbee wireless heater thermostat (SPZB0001)|2020-04-18T11:18:53+00:00}"];
  "0x00158d0001ffaffc" -> "0x000d6ffffe8e8d4f" [penwidth=1, weight=0, color="#994444", label="120"]
  "0x04cf8cdf3c771820" [style="rounded, dashed, filled", fillcolor="#fff8ce", fontcolor="#000000", label="{bedroom/light_sensor|0x04cf8cdf3c771820 (24216)|Xiaomi MiJia light intensity sensor (GZCGQ01LM)|2020-04-18T11:18:37+00:00}"];
  "0x04cf8cdf3c771820" -> "0x000d6ffffe8e8d4f" [penwidth=1, weight=0, color="#994444", label="169"]
}
sjorge commented 4 years ago

@Koenkk sadly joining stopped working too now :(

So it looks like the chances have no effect with either the default or source routing firmware. And the extra network stability was just from using source routing in general.

Any other ideas? Do you want me to do another capture with this firmware?

sjorge commented 4 years ago

Well well this is new and interesting!

What I found weird though is that neither capture shows the broadcast where it allows joining, although I only send the mqtt message to allow join after I started the capture.

sjorge commented 4 years ago

Ugggghh It got weirder...

I did a 2nd attempt to join from the bedroom, I wanted double check I did not see the zigbee join broadcast... and well... I still don't see it... but the bulb joined :|

https://pkg.blackdot.be/cores/zigbee/srcrouting_via_router_second.pcapng

So now I am wonder if I might have accidentally bumped the USB again because the spot where the coordinator is is very cramped.

As I did 3 attempts of joining from the bedroom before doing the capture in different locations (near a tradfri bulb, near the tradfri repeater, and near an innr bulb)

sjorge commented 4 years ago

I guess we wait another few days repeat this, I'll have to be super careful not to bump the usb next time.

sjorge commented 4 years ago

I did not see a coordinator disconnect in the logs though, those should show up on log_level info right?

When comparing the first (fail) and second (success) capture that I did from the bedroom I noticed a few differences:

I wonder if they are related, maybe for the first capture the coordinator buffer was still too overloaded for the transport key due to the device update messages?

Also is there a simple summary of what each message means.. e.g.:

sjorge commented 4 years ago

Oh, looks like the flood of 'device update' is the trådfri repeaterinnr bulb spamming trådfri repeater (next hop) with the ZA unsecure join message without getting an answer... which I guess is the coordinators job to reply to? (with the Transport Key?)

Koenkk commented 4 years ago

but after the TK/unsecure join request from the bulb made it's way to the coordinator

e ZA unsecure join message without getting an answer... which I guess is the coordinators job to reply to? (with the Transport Key?)

I'm confused, did it finally join or not?

sjorge commented 4 years ago

I'm confused, did it finally join or not?

4x (1 capped) attempts via a router, they failed 1x via coordinator, that worked but I might have bumped the USB, not sure 1x via router, also worked.

I was compare the first and last router join (one bad, one ok)

Koenkk commented 4 years ago

Good, so it seems that things are improved?

I've checked the failed attempt via the router, but it seems that the router is also acting strange here. If joining works directly via the coordinator I think the coordinator itself is not the issue?

Also, has this device previously been paired to this router? Otherwise I wouldn't expect the update device.

sjorge commented 4 years ago

Also, has this device previously been paired to this router? Otherwise I wouldn't expect the update device.

Probably, as it is one of my test bulbs. But if it was, it was from when I was using the default firmware.

Depending on the router, I think it was the innr bulb --> trådfri repeater --> coordinator

{"id":2,"type":"Router","ieeeAddr":"0x000d6ffffe8e8d4f","nwkAddr":46726,"manufId":4476,"manufName":"IKEA of Sweden","powerSource":"Mains (single phase)","modelId":"TRADFRI signal repeater","epList":[1,242],"endpoints":{"1":{"profId":260,"epId":1,"devId":8,"inClusterList":[0,3,9,2821,4096,64636],"outClusterList":[25,32,4096],"clusters":{"genBasic":{"attributes":{"modelId":"TRADFRI signal repeater","manufacturerName":"IKEA of Sweden","powerSource":1,"zclVersion":3,"appVersion":33,"stackVersion":98,"hwVersion":1,"dateCode":"20190318","swBuildId":"2.2.005"}}},"binds":[{"cluster":0,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1}]},"242":{"profId":41440,"epId":242,"devId":97,"inClusterList":[33],"outClusterList":[33],"clusters":{},"binds":[]}},"appVersion":32,"stackVersion":98,"hwVersion":1,"dateCode":"20190318","swBuildId":"2.2.005","zclVersion":3,"interviewCompleted":true,"meta":{"reporting":1,"configured":2},"lastSeen":1586973921882}
{"id":4,"type":"Router","ieeeAddr":"0x14b457fffe2bd760","nwkAddr":39924,"manufId":4454,"manufName":"innr","powerSource":"Mains (single phase)","modelId":"RB 250 C","epList":[1,242],"endpoints":{"1":{"profId":260,"epId":1,"devId":269,"inClusterList":[0,3,4,5,6,8,768,2821,4096],"outClusterList":[25],"clusters":{"genBasic":{"attributes":{"modelId":"RB 250 C","manufacturerName":"innr","powerSource":1,"zclVersion":3,"appVersion":16,"stackVersion":98,"hwVersion":1,"dateCode":"20190326-87","swBuildId":"2.1"}},"genOnOff":{"attributes":{"onOff":1}},"genLevelCtrl":{"attributes":{"currentLevel":254}},"lightingColorCtrl":{"attributes":{"currentX":30199,"currentY":26096,"colorTemperature":374,"colorMode":1,"currentHue":21,"currentSaturation":197}}},"binds":[{"cluster":6,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":8,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":768,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1}]},"242":{"profId":41440,"epId":242,"devId":97,"inClusterList":[],"outClusterList":[33],"clusters":{},"binds":[]}},"appVersion":16,"stackVersion":98,"hwVersion":1,"dateCode":"20190326-87","swBuildId":"2.1","zclVersion":3,"interviewCompleted":true,"meta":{"reporting":1},"lastSeen":1586973977775}

I've checked the failed attempt via the router, but it seems that the router is also acting strange here. If joining works directly via the coordinator I think the coordinator itself is not the issue?

I wonder if something is missing on device removal? Maybe some message to tell all routers to forget the device too?

Koenkk commented 4 years ago

AFAIK routers should do this themselves upon receiving a DeviceLeave.

Westcott1 commented 4 years ago

I'm now running OK using CC2531ZNP-Prod_20200410_default I tried first with _CC2531ZNP-Prod_Source_Routing20200410 but it gave 'No network route' errors. Thanks again!

sjorge commented 4 years ago

So far 3 days ... and still working. I picked a very old bulb to test that has for certain never been in the network together with the repeated... wait a few more days and try again.

I am now also seeing the GP requests when sniffing that I only saw near the coordinator last time.

sjorge commented 4 years ago

@Koenkk it failed again!

I was using a bulb I had used before, the innr bulb send the update device which you thought was odd last time.

So I moved it closer to a different router (I'm pretty sure this one has not been the parent of this bulb I was joining before), same thing... but this time the other router send the update device.

I'll edit this post once I uploaded the pcap.

[root@amethyst /opt/zigbee2mqtt]# grep 39924 /opt/zigbee2mqtt/data/database.db
{"id":4,"type":"Router","ieeeAddr":"0x14b457fffe2bd760","nwkAddr":39924,"manufId":4454,"manufName":"innr","powerSource":"Mains (single phase)","modelId":"RB 250 C","epList":[1,242],"endpoints":{"1":{"profId":260,"epId":1,"devId":269,"inClusterList":[0,3,4,5,6,8,768,2821,4096],"outClusterList":[25],"clusters":{"genBasic":{"attributes":{"modelId":"RB 250 C","manufacturerName":"innr","powerSource":1,"zclVersion":3,"appVersion":16,"stackVersion":98,"hwVersion":1,"dateCode":"20190326-87","swBuildId":"2.1"}},"genOnOff":{"attributes":{"onOff":0}},"genLevelCtrl":{"attributes":{"currentLevel":254}},"lightingColorCtrl":{"attributes":{"currentX":29786,"currentY":26830,"colorTemperature":364,"colorMode":1,"currentHue":25,"currentSaturation":199}}},"binds":[{"cluster":6,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":8,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":768,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1}]},"242":{"profId":41440,"epId":242,"devId":97,"inClusterList":[],"outClusterList":[33],"clusters":{},"binds":[]}},"appVersion":16,"stackVersion":98,"hwVersion":1,"dateCode":"20190326-87","swBuildId":"2.1","zclVersion":3,"interviewCompleted":true,"meta":{"reporting":1},"lastSeen":1587740726258}
[root@amethyst /opt/zigbee2mqtt]# grep 31787 /opt/zigbee2mqtt/data/database.db
{"id":3,"type":"Router","ieeeAddr":"0x90fd9ffffee77fcf","nwkAddr":31787,"manufId":4476,"manufName":"IKEA of Sweden","powerSource":"Mains (single phase)","modelId":"TRADFRI bulb E27 CWS opal 600lm","epList":[1],"endpoints":{"1":{"profId":49246,"epId":1,"devId":512,"inClusterList":[0,3,4,5,6,8,768,2821,4096],"outClusterList":[5,25,32,4096],"clusters":{"genBasic":{"attributes":{"modelId":"TRADFRI bulb E27 CWS opal 600lm","manufacturerName":"IKEA of Sweden","powerSource":1,"zclVersion":1,"appVersion":17,"stackVersion":87,"hwVersion":1,"dateCode":"20180410","swBuildId":"1.3.009"}},"genOnOff":{"attributes":{"onOff":0}},"genLevelCtrl":{"attributes":{"currentLevel":254}},"lightingColorCtrl":{"attributes":{"currentX":29786,"currentY":26830,"colorCapabilities":8}}},"binds":[{"cluster":6,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":8,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1},{"cluster":768,"type":"endpoint","deviceIeeeAddress":"0x00124b001938a7e5","endpointID":1}]}},"appVersion":17,"stackVersion":87,"hwVersion":1,"dateCode":"20180410","swBuildId":"1.3.009","zclVersion":1,"interviewCompleted":true,"meta":{"reporting":1},"lastSeen":1587740702344}

Were the routers in play based from the pcap...

sjorge commented 4 years ago

The pcap: https://pkg.blackdot.be/cores/zigbee/failed_join.pcapng

This is still with the source routing firmware, as it seems to result in a more stable network, I have not reset the stick yet. I am going to see if I can pilfer a bulb from a different room that is trådfri but not part of the network. It's out of range, but I can move it for a test.

Of note, the last time I use this bulb (last week for a test) I removed it from the network and it left nicely, no force remove.

sjorge commented 4 years ago

100% sure this bulb was never in the network as it was installed in the garage and well out of range: https://pkg.blackdot.be/cores/zigbee/fail_new.pcapng also failed.

sjorge commented 4 years ago

After a reset both bulbs pair immediately

Koenkk commented 4 years ago

Could you make such a sniff with all devices in your network powered off? (so only the coordinator is active). I think this input is required for Texas Instruments to investigate this further. Also make sure to capture the permit join requests on that sniff. (https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/p/892149/3299688#3299688)

sjorge commented 4 years ago

Sure, will have to wait for joining to be broken again though... so give it take a week.

~ sjorge

On 30 Apr 2020, at 18:59, Koen Kanters notifications@github.com wrote:

 Could you make such a sniff with all devices in your network powered off? (so only the coordinator is active). I think this input is required for Texas Instruments to investigate this further. Also make sure to capture the permit join requests on that sniff. (https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/p/892149/3299688#3299688)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

sjorge commented 4 years ago

:s it took me 2 hours to get all the breakers to get all routers offline:

After those I stopped z2m, unplugged the stick, plugged it back and started z2m... I was still unable to pair. After 1 attempt I also tried via touchlink, same result => https://pkg.blackdot.be/cores/zigbee/fail3_no_routers_touchlink_znc_maybe_stuck.pcapng

At this point I think the entire USB was acting weird, I powered on all breakers again and the network is super unstable and slow again like I had a while ago. After about 10 minutes it seems to stableize but I still can't pair anything. I did the stop/replug/start dance twice now.

Koenkk commented 4 years ago

Seeing some interesting things now.

image

According to https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/520679 it means that the association table is full. I will check if it's possible to expand. I think a re-power of the USB clears the association table which is filled again after some time.

sjorge commented 4 years ago

Is that different from the routing table?

Koenkk commented 4 years ago

Investigated the source code and it's indeed different. Asked a few question to TI (https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/903323) (opened new thread because other one got closed).

sjorge commented 4 years ago

wrt the reply from Ryna...

netamp

I'd expect only serverroom/sensor and masterbedroom/repeater to be in the list? Keeping it < 5 devices?

Koenkk commented 4 years ago

End devices can also be in that list so I think the behaviour is expected.

However I'm not sure if this is the same as the original issue (as in this sniff we get proper responses). Would it be possible to create a sniff with only the coordinator and 1 router powered on?

sjorge commented 4 years ago

Sure, probably takes a few more days before pairing breaks again.

sjorge commented 4 years ago

I also got a zzh stick comming soon, I wonder if the problem also exists there or if it is ZStack 1.2 specific

Koenkk commented 4 years ago

I don't know for sure, but I don't expect it (until now never experienced this and don't have any reports from other users).

sjorge commented 4 years ago

Would be an interesting test case I guess. If I swap over and still experience it. It is almost certain a weird interaction with another device.

If it doesn’t happen, it might be a bug in zstack 1.2

Could still pair a device yesterday so haven’t got a new dump jet.

~ sjorge

On 14 May 2020, at 19:07, Koen Kanters notifications@github.com wrote:

 I don't know for sure, but I don't expect it (until now never experienced this and don't have any reports from other users).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

sjorge commented 4 years ago

OK just tried to pair my new Hue Motion sensor and had the join fail. I only saw 'Leave' messages from the router at my location. So will now go and try near the server.

One with everything powered to see if I see the assoc table reject and if I don't... I'll head out to the breaker box to disable all the lights to not have any routers aside one.

sjorge commented 4 years ago

https://pkg.blackdot.be/cores/zigbee/src_route_full_network_leave.pcapng

first time every trying to pair a hue motion (Xiaomi not picking a new router is annoying so replacing my motion, light and temp/humidity senser... this one seems to do the first two, open for suggestions for a temp+humidity sensor)

It got blasted with leave requests (taken in my room so out of reach of coordinator)

https://pkg.blackdot.be/cores/zigbee/src_route_one_router_leaev.pcapng

after turning off all bulbs via the breaker and unplugging, this only left the trådfri repeater and Xiaomi sensors.

It got blasted with leave requests again (captured near the coordinator)

https://pkg.blackdot.be/cores/zigbee/src_route_no_router_nothing.pcapng

unplugged the repeated

Captured near the coordinator, it seems to at least respond to the beacon... but thats it I think? I wonder if the coordinator is out of memory or something and it can't even get the message out the table is full?

https://pkg.blackdot.be/cores/zigbee/src_route_full_after_reset.pcapng

Captured after issuing a znc reset via mqtt and then pairing again, works fine. Included for completeness.

Koenkk commented 4 years ago

Thanks! Especially interested in: https://pkg.blackdot.be/cores/zigbee/src_route_one_router_leaev.pcapng and the reason why the coordinator sends the "remove device" requests. Updated https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/p/903323/3348648#3348648

sjorge commented 4 years ago

Some observation as I swapped out Xiami Motion and a Xiami light sensor to the new Hue Motion... the network is a lot more responsive.

I guess because the light sensor was super spamming... sometimes updating every second! The hue one does so every 5 minutes, on motion or I think on a 50lx change.

I wonder if we can configure the minimal reporting of the Xiami device to 5 min or something: https://github.com/Koenkk/zigbee-herdsman-converters/blob/master/devices.js#L988

I only had one up for testing but was planning on having 3 (I already have them) but after too many issues with Xaimi I was looking for alternatives. I can imagine having 3 of those would just lock up the network.

Koenkk commented 4 years ago

For the illuminance sensor please try with something like:

await configureReporting.illuminance(
                endpoint, {minimumReportInterval: 10, reportableChange: 100},
            );

On the TI forum they now ask for the sniff. Perhaps you can put the network key online again at the location so they can take a look at it?

sjorge commented 4 years ago

I'll give the reporting config a go during the long weekend. Sure, let me upload the key again.

Edit: It's in the zigbee dir as '.key' so it doesn't show in the listing.

sjorge commented 4 years ago

Re the ti thread... no, this was my new Hue Motion sensor that at the time of the captures had never been paired yet.

Koenkk commented 4 years ago

But you tried to pair it earlier before that attempt right? (https://pkg.blackdot.be/cores/zigbee/src_route_full_network_leave.pcapng)

sjorge commented 4 years ago

Ah yeah, but it never made it into the network though?

Koenkk commented 4 years ago

I found a possible reason that this could fail (details: https://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/p/903323/3354720#3354720).

Previously we always called the permitJoin without a timeout (0xFF), since herdsman with (0xFE). Maybe something goes wrong with the internal timer of the adapter. This is also why this comment makes sense: https://github.com/Koenkk/zigbee2mqtt/issues/3177#issuecomment-602288259 (1.6.0 = first release of zigbee-herdsman).

Can you try replacing /opt/zigbee2mqtt/node_modules/zigbee-herdsman/dist/controller/controller.js with https://gist.github.com/Koenkk/61ac011b3e682fb5e707db6b320ae76c?

sjorge commented 4 years ago

Done, I'll avoid updating z2m for a bit and try to join a device after a few days. I don't have any 'new' never joined devices, but if it is fixed a random bulb should be fine I think?

Koenkk commented 4 years ago

Yes should be good (assuming you were previously able to reproduce the bug with this)

sjorge commented 4 years ago

Yeah it was with all pair attempts after a few days new or old device.

~ sjorge

On 22 May 2020, at 13:39, Koen Kanters notifications@github.com wrote:

 Yes should be good (assuming you were previously able to reproduce the bug with this)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.