Closed nickw444 closed 2 years ago
Thanks for the well detailed problem description and sniffs!
I can provide a firmware with additional debug logging, I've checked failed-join-bathroom-motion-via-study-lounge.pcapng
and I see the problem is on the coordinator side since the Update device
is not followed up by a Transport Key
. But before doing this, can you confirm that you did NOT reflash your coordinator after repairing all devices?
But before doing this, can you confirm that you did NOT reflash your coordinator after repairing all devices?
Yep the coordinator was reflashed first along with nuking the z2m data directory and setting up with a new Pan ID and network key; then I began re-pairing all devices onto this new network.
Can you provide the herdsman debug logging when trying to pair a device through a router fails?
I assumed you are currently running CC1352P2_CC2652P_launchpad_coordinator_20210708.zip
, if not don't flash this fw.
Hey Koen, I flashed the debug firmware to my stick, enabled debug and had a crack re-pairing one of my devices (0x00158d000421a072 (0x2582)
) via one of the problematic routers;
Interestingly no debug messages appear from the coordinator when attempting pairing through the malfunctioning router in any of the captures below. Pairing via a working router appears to work/show debug messages.
You'll see these in the attached zip file, which contains the pcap
and the log from z2m (including herdsman debug log)
The two devices I've been playing with have the following IEEE addresses:
0x00158d0004507670
0x00158d000421a072
In this capture, I tried pairing via a few different routers, until the last succeeded
This one is probably the most interesting of the lot;
0x00124b001fc64014
).0x588e81fffe0205b1 (0x049F)
) without success. No messages written in herdsman log AFAICT.0x00124b001fc64014 (0xC9A5)
FWIW, all routers are very close to one another for this test; 0x00124b001fc64014
is 4m away from 0x588e81fffe0205b1
, and both of those are about 5m away from the coordinator.
@nickw444 what does z2m-debug-1.zip
contain? I need a log when pairing fails (and no other attempts in it ) (asking because I see some debug logging in it, e.g. line #1202)
All of those contain both the debug log from the z2m container (supervised in hassio) as well as a corresponding packet dump. (followed the steps here under the Home Assistant OS/Supervised addon heading)
As mentioned above, no debug messages appear from the coordinator when attempting pairing through the malfunctioning router, which I imagine is what you're seeing here - a lack of debug messages. I experienced this for all of the captures above, only pairing via a functioning router appeared to show logging/debug messages from herdsman.
I see some debug logging in it
I enabled the debug for z2m as well, I can turn this off if it helps?
Snippet from my z2m hass supervised config:
...
advanced:
log_level: debug
...
zigbee_herdsman_debug: true
@nickw444 but what did you try in "Capture + Log 1", you explained it for the others but not for "Capture + Log 1", (e.g. for "Capture + Log 2"):
In this capture, I tried pairing via a few different routers, until the last succeeded
What I need is a capture + log which only contains a failing attempt to pair a device via a router. (to prevent log pollution)
Apologies, I think I misinterpreted your first message. I vaguely recall that the first capture was toggling pairing across a few different routers until it finally paired, similar to the last capture.
log which only contains a failing attempt
Do you mean by this to set up a network where there are no other end devices (and just routers)?
This aside, I've done a new capture, this time attempting to pair a single device to a single router (0x00124b001fc64014 (0xE516)
).
I've taken a screen recording of the process too, in case that helps? You can see in this video the beacon request and response, but no activity in the z2m log.
Exactly what I needed (sniff + logs)
Can you again provide a herdsman debug log + sniff of just 1 pairing attempt with the fw below: znp_CC1352P_2_LAUNCHXL_tirtos_ccs.hex.zip
Note, pairing might always succeed now, let me know if this is the case.
Awesome, I've just flashed this firmware, and attempted to pair the same device (0x00158d0004507670) again to the same router (0x00124b001fc64014 (0xE516)
).
Here's the pcap and the herdsman debug log: Archive 2.zip
Didn't successfully pair, and didn't see any new log lines emitted whilst attempting to pair the device.
(I have not re-paired any routers, but I imagine that isn't required?)
@nickw444 to be sure, can you try repairing the router that you are trying to pair via (I see the joining procedure does not start at all while it did in https://github.com/Koenkk/zigbee2mqtt/issues/8803#issuecomment-929687244)
@Koenkk I've re-paired the router in question (using Permit Join via All), it looks as though to have the coordinator as its parent on the map.
I've done another capture of pairing the same Aqara button 0x00158d0004507670
to the same router 0x00124b001fc64014
, but still didn't manage to successfully
Archive 3.zip
pair. Hopefully this initiated the pairing procedure in this capture.
If not, I can try again, is there anything I need to look for in Wireshark to confirm whether it had begun? Just to save the back and forth here.
@nickw444 I see don't see the debug logging, which means the coordinator isn't letting any messages through.
0x00124b001fc64014
?data/coordinator_backup.json
?What device is 0x00124b001fc64014?
This is a "Custom devices (DiY) CC2530.ROUTER" (Just a CC2530 + 2591 PA dev board like here: https://www.zigbee2mqtt.io/how_tos/how_to_create_a_cc2530_router.html). That being said, I've also got this issue with one of my IKEA Traffi signal repeaters too (described in the initial issue/bug report)
Could you share your data/coordinator_backup.json?
Interestingly, the coordinator backup doesn't have that router IEEE address in it, but I am not sure if that was due to an unclean shutdown of my Pi. (Currently in the process of changing OS, so can't boot it up at this very moment, this was taken from an FS snapshot just now). Once it's back up and running I will re-pair the router and get a new copy of the backup.
Should this backup also list all devices on the network? It only has a small amount. There are over 20 devices paired on this network.
When repairing the CC2530.ROUTER, did you first remove it (non a force remove)? if not, do this, try to pair the device via the router and provide the log + sniff.
Hey Koen, yep, this was done by removing it from Z2M (and a toast for "device left network" appeared).
In the mean time, I've tried to reproduce this away from my Pi on my desktop. I've paired a minimal network which I am able to reproduce with:
I've provided in the attached zip:
filename | description |
---|---|
test-network-rebuild-failed-join-cc2530.log |
Log file produced by z2m (incl. herdsman-debug) whilst building the test network. |
test-network-rebuild-failed-join-cc2530.pcapng |
pcapng captured whilst building the test network. Includes some failure to join via CC2530 router during rebuild |
test-network-failed-join-only.pcapng |
pcapng captured whilst pairing a single device to the CC2530 router after rebuild. |
test-network-failed-join-only.log |
pcapng captured whilst pairing a single device to the CC2530 router after rebuild. |
data |
z2m data directory for the test network, includes flushed coordinator backup. I noticed this coordinator backup only has 2 devices in it, the two IKEA routers? |
Devices List
Network Map
(test-network-rebuild-failed-join-cc2530.{pcapng,log}
)
(sorry this is a large pcapng, but shows the building of network from 0 devices)
seq # | last log line time | event |
---|---|---|
60 | -- | Permit Join |
77 | -- | Pair 0x588e81fffe021892 . Paired as 0xc702 |
248 | -- | Permit Join via 0xc702 |
273 | -- | Pair 0x588e81fffe0205b1 via 0xc702 . Paired as 0xecd5 |
585 | -- | Permit Join via 0xecd5 |
703 | -- | Permit Join via 0xecd5 |
725 | -- | Pair 0x00158d0004507670 via 0xecd5 . Paired as 0xa137 |
1071 | -- | Pair 0x00124b001fc64014 via 0xecd5 . Paired as 0xb449 |
1254 | -- | Permit Join via 0xb449 |
1277 | 2021-10-03T23:21:54.960Z | Pair 0x00158d00045075e9 via 0xb449 . Failure after multiple attempts. See numerous association requests/responses between SEQ 1284 - SEQ 1591 |
1630 | -- | Permit Join via 0xc702 |
1697 | 2021-10-03T23:23:24.025 | Pair 0x00158d00045075e9 via 0xc702 . Paired as 0x85f1 |
1990 | -- | Permit Join via 0xb449 |
2014 | 2021-10-03T23:25:05.047Z | Pair 0x00158d000588ffaa via 0xb449 . Failure after multiple attempts. See numerous association requests/responses between SEQ 2014 - SEQ 2145 |
2181 | -- | Permit Join via 0xecd5 |
2203 | 2021-10-03T23:26:27.699Z | Pair 0x00158d000588ffaa via 0xecd5 . Paired as 0x70fd |
test-network-failed-join-only.{pcapng,log}
- This dump contains a log + pcapng of the startup of Z2M and the pairing sequence to the CC2530 router only.
0x00158d000572ea01
via 0x00124b001fc64014
(CC2530 Router).Still no debugging appeared, now that we know for sure it has nothing to do with the backup mechanism, this must be a bug in the firmware. I was checking https://software-dl.ti.com/simplelink/esd/simplelink_cc13x2_26x2_sdk/5.20.00.52/exports/changelog.html and found:
When TC settings do not mandate TCLK update, TC does not process APS packets w/ well-known TCLK encryption for remote multi-hop devices
Looks likes TI might already have fixed this bug, however I didn't update to 5.20 yet since it gave issues (current fw is based on 5.10). I expect the next release soon (last was released in July and they release every quarter), will ping you when I have a fw ready.
Ok sounds good.
I might try an older firmware release in the interim to see if that improves the situation (however IIRC one of the bugs in the previous release was inability to pair to a specific router so might be hard to know for sure if this is the case)
I will follow here.. Maybe this is actually the rootcause of what I have reported here: https://github.com/Koenkk/zigbee2mqtt/issues/8922
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days
Be a nice bot and don't close it.
So to summarize, this is very likely because there is a bug in the latest Z-stack https://github.com/Koenkk/Z-Stack-firmware, and now we are in the hands of Koenkk to see if he can make 5.20 firmware work or until TI do a new release?
Details: over here, latest July 14, that means they might be a bit delayed on the next version):
Update: Apparently works only for zzh
For anyone interested, flashing January zstack firmware might likely get you moving in the interim. As per this discord discussion:
It didn't work for me though :-(
New firmware is live, please check with the following fw: https://github.com/Koenkk/Z-Stack-firmware/tree/develop/coordinator/Z-Stack_3.x.0/bin
Tried the new firmware. Didn't change anything for me. Can't add new devices. Thank you for trying β₯οΈ
Same here. Anyway, there was simply something wrong when I switched to slaesh - I tried different firmwares from January, July and the one now, but none of them could connect new devices. I deleted everything, changed pan_id and network_key and repaired everything again flawlessly :shrug: ...
So in my case I couldn't find another way than starting from scratch after migrating from CC2531.
I deleted everything, changed pan_id and network_key and repaired everything again flawlessly π€· ...
It'll be interesting to see if this continues to be the case. I found in my testing of previous firmwares the problem got worse after a few days after the initial setup. Planning to try new firmware this weekend (but not too keen on re-pairing everything...)
I was playing around today and noticed two things that might be interesting:
@nickw444 You were right, I now can't add new TRV valves of the same type that I could add at the beginning when I restarted everything (and when I used cc2531) :cry: . Now it has been just 3 days... Is there a chance that when I use older firmware version, it will work (I tried January one, no results)? What FW you use, please?
@Koenkk So bad news, this doesn't seem to solve it. But thank you a lot for trying and let me know if I should try to provide more logs and sniffs.
Is this somehow specific to our setup? I am really surprised there are no people complaining if it's in the firmware for many cc2562 devices many months :thinking: .
To all the people still experiencing this issue; did you all migrate from a CC2531 on the 1.2 firmware to CC2652/CC1352 without starting from scratch?
I did not start from scratch, correct. I used firmware CC2531_SOURCE_ROUTING_20201128.zip
Neither, in every instance I have started fresh - In fact, when I set up my CC2652 I missed the release note to change Z2M repo in Hass IO, so didn't even have the capability to do the migration π
I migrated from CC2531, firmware version 2021127
after massive network outages and issues on around 35 devices (the network simply froze).
Then I got the Slaesh stick with the June firmware, migration went fine (i.e. the network was up), but I couldn't join new devices (and some of them simply "dropped off" during the migration, specifically one IKEA Fyrtur blind - and I couldn't reconnect it). See #9565 .
So I started fresh and was happy that I can pair stuff again (including that blind, for example), but now I cannot pair some of my devices again. I can pair an IKEA repeater, but not TRVs (that worked fine when I migrated from CC2531 co Slaesh, I guess the repeater might reach Coordinator but the TRVs cannot even though they are in the same room, but they are battery operated). I upgraded to the latest firmware you linked, but that made no difference. I cannot pair these, unfortunately.
@hnykda could you:
Association Response
to the device you are trying to pair. This is the router the device is trying to pair viaUpdate Device
to the coordinator, observe that the coordinator does NOT send a Transport Key
back (this is the problem)devices
list of data/coordinator_backup.json
, if yes, also check if it has a link_key
in it. Example:
data/coordinator_backup.json
now with a link_key
.Hi. I probably somehow failed in doing it, apologies... Even when I moved the TRVs next to the coordinator, I cannot connect it at all and I don't see anything that would point to the TRVs actually sending anything. Maybe the TRVs just magically broke? Anyway, I tried with an IKEA blinds controller. I couldn't pair it when being far from coordinator, but I could when being in the coordinator's range. Unfortunately, when I am not close to the coordinator, I don't see any "Association Response" (with the button nor the TRVs - with those though, I don't see it even when close to it). Therefore I didn't even get to the second bullet point.
Also, I suspect I am somehow haunted - when trying to pair one of the 3 TRVs that don't work now, even though I haven't seen anything in wiresharks logs (but I could've just stopped the capture as I was trying it like 50 times), it got paired. The other one didn't at all even when putting it next to coordinator (it's the _TZE200_zuhszj9s
/ TS0601
, so the pairing is usually trivial). I suspect it "caughted" a glimpse of the coordinator signal somehow?
All the pcapng here: https://easyupload.io/m/0wrw4t - the unsuccessful_pairing_*
or failed_button_far
are when I was putting the TRV/button into pairing mode when not being close to the coordinator, sniffing for traffic right next to the device.
It's all on the today's zigbee2mqtt-edge
and the latest dev firmware for slaesh slack you sent earlier.
Thanks for your help Koenkk, :bear: with me please. Based on the sniff packet 200 I sent you earlier when trying to pair the button, I found out that the source router for that pairing was my IKEA repeater paired earlier with network address cbca
. I went to coordinator_backup.json
and haven't found any device with nwk_address: "cbca"
(and the backup is from today - 3 days after it was paired):
Should I now remove that IKEA repeater and then pairing it again or just pairing it again without removing first (is that even possible)?
Should one see all routers there? (Even if not starting from scratch?) I see only 2 devices there. The two Xiaomi buttons(!) I recently paired in the basement 5days ago (see above).
Should one see all routers there?
Good question. I have 31 Zigbee devices (excluding the coordinator), but only 23 devices in coordinator_backup.json
Should one see all routers there? (Even if not starting from scratch?) I see only 2 devices there. The two Xiaomi buttons(!) I recently paired in the basement 5days ago (see above).
@castorw does the coordinator_backup.json contain all the router or just the ones with a specific link key?
@Koenkk All devices known to the coordinator, including those connected via routers should be backed up. Although I am not entirely sure ZNP stores these devices in ADDRMGR
if they do not exchange APS keys.
Can any of you guys having less devices in coordinator_backup.json
than actually paired in the network provide me with the coordinator_backup.json
and NV dump from the coordinator? I could investigate further in the matter.
To extract NV contents you can use: https://github.com/zigpy/zigpy-znp/blob/dev/zigpy_znp/tools/nvram_read.py.
@Koenkk
to be sure, also check if the ieee address it not in the coordinator_backup.json, using the z2m frontend you should be able to find the ieee address of the device
OK, I actually found that under the IEEE address (but different network address) :shrug: :
{
"nwk_address": "76da",
"ieee_address": "84fd27fffe702988",
"is_child": false,
"link_key": {
"key": "507de2a6bfea12c60f3a2505518437db",
"rx_counter": 0,
"tx_counter": 5275
}
},
now I don't know what you want me to do, because you wrote "if no, then ..."
What should I do "if yes" :grimacing: ?
And I seem to have all the devices there. The backup actually seems to contain additional three more devices (maybe the three TRVs I haven't paired yet again?).
@hnykda The network address is not as important as the IEEE address. The network address may change. However, what is interesting is that APS key rx_counter
is 0.
OK, didn't know that, thanks.
This is the router: https://www.zigbee2mqtt.io/devices/E1746.html#ikea-e1746 . I had no troubles of pairing this device even when I was far from the coordinator, but it's maybe because my house is so small that this device can simply reach it as any other router in my house according to the network map:
The red circle is the IKEA repeater.
The device I had an issue to pair when not being close to the coordinator was https://www.zigbee2mqtt.io/devices/E1766.html#ikea-e1766
@hnykda :
Eh, I think I actually have two problems:
Is the problem present after the router has been freshly paired to network or after restoring from backup?
I believe this problem wasn't present after a fresh network (at least I haven't recognized that), so there wasn't any backup as far as I can tell (I simply deleted the old zigbee2mqtt data directory and started from scratch with a new network channel) :thinking: .
@hnykda Could you contact me on telegram? @castorko I am pretty curious about the state of your network and there are indications of possible problems with ZNP adapter handling in zigbee-herdsman code.
@castorw output of /nvram_read.py : backup.txt coordinator_backup.json : coordinator_backup_json.txt database.db: database_db.txt
Same problem as hnykda, is a tube zb ethernet/usb coordinator upgraded from a CC2530+CC2591+esp01 coordinator with source routing firmware.
I'm now able to pair with new devices using the latest version of zigbee2mqtt (1.22.1-1) with latest firmware ( Z-Stack 3.x.0 20210708) for my device (Zzh! CC2652R Multiprotocol RF Stick)
@francisp2 What exactly? Cannot pair certain devices? Cannot pair devices at all? Cannot pair through certain routers? Are you trying with the latest Z2M version?
What happened
Can only pair end devices via specific routers on the network
What did you expect to happen
Can pair end devices via any routers on the network
How to reproduce it (minimal and precise)
Debug info
Zigbee2MQTT version: 1.21.1-1 (commit: unknown) in hassio supervisor, running on Rpi4 4GB Adapter hardware: CC1352P-2 (based on RF-Star CC2652P2) Adapter firmware version: zStack3x0, 20210708
Further details
Initially I found https://github.com/Koenkk/zigbee2mqtt/issues/7762; which I believed was the root cause of my problem. I had an older network originally running from a CC2530 based coordinator.
However, I have completely nuked my existing network, including re-flashing my CC2652P2 stick, nuking all z2m data (including coordinator backup), and re-paired all existing devices multiple times and still appear to have this same issue.
It does not appear to matter what channel, Pan ID, or network key is used.
I tried reproducing initially on a smaller test network (2 repeaters - 1x Tradfri repeater, 1x DIY CC2530, 2 end devices, CC2652P2 based coordinator), but couldn't reproduce the problem in this configuration.
Instead have needed to re-create my entire production network (including repairing some annoying to access end devices). The issue seems to present itself as the network gains complexity. After adding 5 router devices, it appears to become a problem. End devices will only pair through certain routers. You can see this in the map:
I did some further digging this weekend using a CC2531 with sniffer firmware + wireshark. I have captured the following traces:
Archive.zip
(yes I know I've exposed my network key, I will roll it when/if this reaches a resolution)
To explain these traces, there are 5 routers on the network;
failed-join-bathroom-motion-via-study-lounge.pcapng
Attempt to join "Bathroom Motion" (
0x00158d00063372bf (0x5FB2)
) via "Study Repeater" (0x00124b000add55c6 (0xE671)
).Successfully joins via "Lounge Repeater" (
0x588e81fffe0205b1 (0x049F)
)failed-join-bathroom-via-study-lounge-bedside.pcapng
Attempt to join "Bathroom Door" (
0x00158d000451d91a (0x8DFC)
) via "Study Repeater" (0x00124b000add55c6 (0xE671)
).Attempt to join via "Lounge Repeater" (
0x588e81fffe0205b1 (0x049F)
)Successfully joins via "Nicks Bedside" (
0xec1bbdfffe9cf5a2 (0xE63A)
)failed-join-front-door-button-via-living-room-noise-living-room-lounge.pcapng
Attempt to join "Front Door Button" (
0x00158d00045075e9 (0xFD40)
) via "Living Room Noise Repeater" (0x00124b001fc64014 (0xF82B)
)Attempt to join via "Living Room Repeater" (
0x588e81fffe021892 (0x6CFA)
)Successfully joins via "Lounge Repeater" (
0x588e81fffe0205b1 (0x049F)
)failed-join-lemon-tree-via-bedside-via-louge.pcapng
Attempt to join "Lemon Tree" (
0x00124b000611b998 (0x997B)
) via "Nicks Bedside" (0xec1bbdfffe9cf5a2 (0xE63A)
)Successfully joins via "Lounge Repeater" (
0x588e81fffe0205b1 (0x049F)
)failed-join-living room-button-via-living-room-bedside-lounge.pcapng
Attempt to join "Living Room Button" (
0x00158d00028f5873 (0x3B45)
) via "Living Room Repeater" (0x588e81fffe021892 (0x6CFA)
).Attempt to join via "Nicks Bedside" (
0xec1bbdfffe9cf5a2 (0xE63A)
)Successfully joins via "Lounge Repeater" (
0x588e81fffe0205b1 (0x049F)
)failed-join-living-room-motion-via-many-final-through-nicks-bedside.pcapng
Attempt to join "Living Room Motion" (
0x00158d000421a072 (0x5B83)
) via many different repeaters. Finally succeeds via "Nicks Bedside" (0xec1bbdfffe9cf5a2 (0xE63A)
)