Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.07k stars 1.67k forks source link

[Problem]: After switching to CC2652 coordinator, my network works perfectly, but no device will pair with it. #10339

Closed convicte closed 2 years ago

convicte commented 2 years ago

What happened?

I transitioned from an old and weak CC2531, which worked ok, but with 40 devices connected started to misbehave, lose connection to devices, etc. Unfortunately, upon switching to CC2652 there is no way to pair any new device (IKEA, XIAOMI, TUYA, etc.). The network rebuilt perfectly fine after the coordinator change, and is updating sensors just fine, but will not execute a pairing process.

I've changed back and forth between different coordinator firmwares, to no effect. Unplugging the coordinator, restarting the VM in which HA is running, neither restarting HA made any effect.

What did you expect to happen?

When pairing is initiated, the interview process begins and the device joins the network as per the old CC2531.

How to reproduce it (minimal and precise)

1) Connect the next adapter to the network 2) Permit_join enabled 3) Put device in pairing mode 4) Wait until the device times out because nothing will happen

Zigbee2MQTT version

1.22.1

Adapter firmware version

20211217, 20210319, 20210708

Adapter

Ebyte CC2652P

Debug log

How to record a debug of something that doesn't happen? I've been monitoring the logs and while the permit_join is enabled, no interview process is initiated. The device sits idle until it times-out.

Koenkk commented 2 years ago

try with the fw from https://github.com/Koenkk/Z-Stack-firmware/tree/develop/coordinator/Z-Stack_3.x.0/bin

convicte commented 2 years ago

I am on it now - no luck. The exact same behavior as on the two others.

The network works great, but pairing is completely dead. image

henkiejan1 commented 2 years ago

Hmm, i have recently switched to the CC2652 also but no problems at all with repairing some devices. Did you delete devices who exist in your network and try to add again? If its not a new device maybe exist the device just in your databse.db?

Tonio16 commented 2 years ago

Hello

I switched from a zigate to a sonofff itead yesterday. No pairing issue with this firmware : 20210120

Antoine

Koenkk commented 2 years ago

@convicte

convicte commented 2 years ago

@Koenkk

Thanks for the update and my apologies for the late reply.

1) If I understand correctly from the link attached, you are asking to use the old CC2531 I have, with a sniff firmware plus a windows tool to collect a sniff of the network traffic at my selected channel - is that correct? 2) Please see the backup below (minus the keys - obviously). image

One more data point for the evaluation; I've been trying to pair another IKEA Dimmer and held the reset button for about 10 seconds, while in the room with a Zigbee floor lamp (Office floor lamp). At the same time, this lamp was kicked out of my network with the following log output and can't be repaired, because nothing else can be. chrome_6QA8yAcywg

Not sure if a fluke or related, but haven't had this happen ever. EDIT: It must be in some way connected because keeping the button on the IKEA dimmer pressed for 10 seconds again is affecting the lamp (it starts 'breathing' in brightness as if it would enter pairing mode). After about 30 seconds, it fails and goes back to solid light. It's a Lidl floor lamp which is recognized by z2m as a LED strip controller, but works just fine. image

Koenkk commented 2 years ago

If I understand correctly from the link attached, you are asking to use the old CC2531 I have, with a sniff firmware plus a windows tool to collect a sniff of the network traffic at my selected channel - is that correct?

Yes

I've been trying to pair another IKEA Dimmer and held the reset button for about 10 seconds, while in the room with a Zigbee floor lamp (Office floor lamp).

See https://www.zigbee2mqtt.io/devices/ICTC-G-1.html#pairing how to reset this device, when you hold it 10 seconds it will setup a new network with just the dimmer and the lamp

convicte commented 2 years ago

If I understand correctly from the link attached, you are asking to use the old CC2531 I have, with a sniff firmware plus a windows tool to collect a sniff of the network traffic at my selected channel - is that correct?

Yes

My sincere apologies, but I am failing at providing a sniff. I've spent quite some time to get the CC2531 set up as the sniffer tool, but having no CC debugger and unable to find a way to flash it via USB as I did for the 2653 using the TI Flasher V2 software, I am a bit stuck. I've downloaded the ZBOSS V.2 since V1.0 doesn't provide the sniffer.hex firmware file, but can't get further than this. Some guidance would be greatly appreciated!

Is there any chance my backup file gave some hints?

For your kind consideration, my z2m config as well: image Please note, I change the permit join from the HA interface, but I wonder if I should set it permanently to true in the config and switch off from the interface (or vice versa)

I've been trying to pair another IKEA Dimmer and held the reset button for about 10 seconds, while in the room with a Zigbee floor lamp (Office floor lamp).

See https://www.zigbee2mqtt.io/devices/ICTC-G-1.html#pairing how to reset this device, when you hold it 10 seconds it will setup a new network with just the dimmer and the lamp

There may have been a misunderstanding in that I was not trying to pair it this way. I know it's to be reset by pressing the button 4 times - https://www.zigbee2mqtt.io/devices/E1743.html (it's actually this IKEA dimmer/switch). In my desperation, I was trying different things to make the network recognize it.

All this said, I am still unable to explain why getting the IKEA button into it's direct pair mode would kick out a different brand device from the network, as seen above.

When I fix pairing issues I'll play with this, but for the moment getting something to pair with my network is a priority.

Koenkk commented 2 years ago

@convicte unfortunately I need a sniff to debug this further, one thing you can try is to pair the dimmer when holding it very close to the coordinator (to ensure it tries to pair via that)

Fabiancrg commented 2 years ago

@Koenkk I had some issue myself recently to pair devices. The only way to do it was to pair them close to the coordinator (<1m), when I was too far it was not working anymore. I experienced it just now with two devices, one Aqara contact sensor and one Tuya roller shutter I was not able to pair.

I tried the new FW on my CC2652R and the pairing worked just fine form their place (not close to the coordinator).

convicte commented 2 years ago

@Koenkk My apologies for the late reply. It was a busy week.

1) In desperation, having no way to flash the sniffer I borrowed a Sonoff 3.0 dongle which was running an entirely different firmware from July 2021, and it exhibited the same exact behavior - perfect network performance but inability to pair devices, so it's unlikely to be my hardware 2) I tested 5 different devices, both holding it next to the coordinator and in an entirely different part of the house, via routers. Did not make any difference. I once managed to get the device associated with the network, but it failed to recognize and when removed never associated again, like the others. 3) I've also changed the USB extension, which is used to get the dongle outside my server rack, for better reception. Two different extensions didn't make a difference.

Finally, I started looking for ways how to flash the sniffer with available hardware I had and found an old PI and instructions how to use it to flash a CC2532, which worked!!

I now have a sniffer and managed to follow your instructions to grab a paring process via routers with an IKEA button listed above. My only concern is with posting it, hence it will contain network keys. Is there any way to share it with you securely? Second, I can grab another pairing process right next to the coordinator, if this is of interest?

Thank you!!

Koenkk commented 2 years ago

Second, I can grab another pairing process right next to the coordinator, if this is of interest?

yes

You can send the network key to me on telegram (@koenkk)

convicte commented 2 years ago

Thank you!

I just texted you on Telegram with the key.

Pairing attempt via router.zip Pairing attempt via coordinator.zip

Both files as requested!

Koenkk commented 2 years ago

I've checked the coordinator sniff, somehow the joining device isn't doing an association request. But while checking your sniff I found another issue. It seems the coordinator is using a different epid than the rest of the network leading to pan conflicts.

Screenshot 2022-01-02 at 11 21 16

@castorw my first thought is that this is caused by the cc2531 returning the wrong epid (0xdddd..), during the migration we write the wrong epid into the cc2652, could that be the case here?

convicte commented 2 years ago

Thank you! I've seen this conflict in the sniff but failed to recognise the significance. I've never touched the panID during the migration but I am wondering how it could get changed if it's listed in the z2m configuration, which should have been burned in to the CC2652 upon first boot. Can I change it now on the CC2652 or attempt to correct it when the backup is loaded onto a fresh CC2652 (after flash wipe)?

Just in case you are wondering I could run another sniff with a different device entirely, to see if it's also not sending out a pairing request. This said, I am at a loss why changing the coordinator would case the endpoints not to request pairing.

My apologies if these are trivial questions.

Koenkk commented 2 years ago

@convicte can you check if the extended_pan_id in the coordinator_backup.json is also dddddddd...? If yes, change it to the one starting with 95e7 in the sniff, reflash your coordinator and start z2m might solve this problem.

convicte commented 2 years ago

Yes, indeed it is! image

I'll change it in a moment. By reflash you mean the latest 20211217 firmware, to clear the 'old' backup on the coordinator and make it pull the new one with the correct extended_pan_id from the .json file?

Is that correct?

Koenkk commented 2 years ago
convicte commented 2 years ago

Perfect, I did half of that already, so just need to run into the attic to get the stick down, to reflash it.

I'll update everyone ASAP.

convicte commented 2 years ago

Quick update:

1) Following the changes, there is definitely progress, but it's not completely resolved yet. I am predicting you will request another sniff of the situation as it stands. 2) When standing right next to the coordinator, the same IKEA remote successfully announced itself, and started the joining process, but eventually left the network. This happened twice, with the third try resulting in the remote remaining in the network but as an unsupported device. Below see the excerpt from the z2m logs of one such event:

debug 2022-01-02 21:52:09: Received MQTT message on 'zigbee2mqtt/bridge/request/permit_join' with data '{"device":null,"time":254,"transaction":"7fshf-1","value":true}' info 2022-01-02 21:52:09: Zigbee: allowing new devices to join. info 2022-01-02 21:52:09: MQTT publish: topic 'zigbee2mqtt/bridge/response/permit_join', payload '{"data":{"time":254,"value":true},"status":"ok","transaction":"7fshf-1"}' info 2022-01-02 21:52:25: Device '0xccccccfffe020fd9' joined info 2022-01-02 21:52:25: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"0xccccccfffe020fd9","ieee_address":"0xccccccfffe020fd9"},"type":"device_joined"}' info 2022-01-02 21:52:25: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":{"friendly_name":"0xccccccfffe020fd9"},"type":"device_connected"}' info 2022-01-02 21:52:25: MQTT publish: topic 'zigbee2mqtt/0xccccccfffe020fd9/availability', payload 'online' info 2022-01-02 21:52:25: MQTT publish: topic 'zigbee2mqtt/0xccccccfffe020fd9', payload '{"last_seen":"2022-01-02T21:52:25+01:00"}' info 2022-01-02 21:52:25: Starting interview of '0xccccccfffe020fd9' info 2022-01-02 21:52:25: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"0xccccccfffe020fd9","ieee_address":"0xccccccfffe020fd9","status":"started"},"type":"device_interview"}' info 2022-01-02 21:52:25: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"interview_started","meta":{"friendly_name":"0xccccccfffe020fd9"},"type":"pairing"}' info 2022-01-02 21:52:27: MQTT publish: topic 'zigbee2mqtt/0xccccccfffe020fd9', payload '{"last_seen":"2022-01-02T21:52:27+01:00"}' info 2022-01-02 21:52:27: MQTT publish: topic 'zigbee2mqtt/0xccccccfffe020fd9', payload '{"last_seen":"2022-01-02T21:52:27+01:00"}' debug 2022-01-02 21:52:27: Device '0xccccccfffe020fd9' announced itself info 2022-01-02 21:52:27: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"0xccccccfffe020fd9","ieee_address":"0xccccccfffe020fd9"},"type":"device_announce"}' info 2022-01-02 21:52:27: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"announce","meta":{"friendly_name":"0xccccccfffe020fd9"},"type":"device_announced"}' warn 2022-01-02 21:52:36: Device '0xccccccfffe020fd9' left the network

3) Trying to achieve the same away from the coordinator amounts to nothing - no messages or response from the interface when the device is in pairing mode. Unable to pair any other device, either, when away from the coordinator. 4) The whole network restarted and once again is working perfectly as is.

Let me guess, a sniff of pairing next to the coordinator and one via routers? ;)

Cheers!!

Koenkk commented 2 years ago

Can you provide a sniff when pairing a non battery powered device (something like a switch or light) via a router?

convicte commented 2 years ago

Absolutely!

Please see pairing attempts of the same lamp as listed above (Lidl Livarno). It seems to be stuck in a pairing loop where association is successful, but it's actually not showing up in the interface at all. No messages or iconography is depicted in the z2m interface at all. It's as if the process never happened, as far as the HA interface is concerned.

Pairing attempt via router for a floor lamp.zip

Same passwords as yesterday.

fuglphoenix commented 2 years ago

@convicte, This sounds a lot like my problem.

I have a Slaesh's CC2652RB Coordinator and just updated to 20211217. I haven't migrated, but did a fresh setup when i recieved the stick. z2m version is 1.22.2

i'm trying to pair my devices (ikea, Tuya and others) via a ikea router, since they are a bit far away from the coordinator. it seems to work since the devices are stopping the light blinking sequence (join mode) but when i look in z2m no device are showing up.

if i disable join in z2m and turn the device off and on. It will revert back to the light blinking join mode. if i move them, i can pair them directly to the coordinator

unfortunately i haven't got any spare zigbee stick to sniff what is happening.

vadimag1 commented 2 years ago

I have exactly the same problem. Devices are not connected to the coordinator.

cracrama commented 2 years ago

Hmm similiar situation with my network - after flashing 20211217 on Ebyte CC2652P I have strange behaviour from couple of sonoff devices - temp humidity sensors, which were disconected from the network after flashing firmware.. I can re-pair them but after pairing they not reporting anything and finally disconectiong from network. However one of these devices - same model still works and reporting correctly - that was device which stayed connected after flashing. Other devices pair / report correctly. I don't know what happened and that behaviour is really strange.

Coax88 commented 2 years ago

Hi Im experience the same issue after migrate from cc2531 to cc2652p, before I could pair without any problems, after upgrading i have issue parring and need alot of times take device close to the sonoff cc2652p before it pairs.

xrust83 commented 2 years ago

The problem is more global and does not seem to be related to the adapter firmware. I'm still using сс2531 I flashed on the latest firmware Z-Stack_Home_1.2 20211115/20211116 got the same result it seems to work but I can not connect a new or remote old device. If the problem was only in the firmware, then rollback and the old version of Z-Stack_Home_1.2 20201127/20201128 would return everything back, but it does not happen, so the reason is in the version itself zigbee2mqtt

cracrama commented 2 years ago

I have reflashed firmware with developed branch and it seems like everything is fine - I had to repair sonof end devices and then everything seems fine. Word of advice - Once you reflash firmware let the network to "settle down" couple of hours because it seems like that plays a big role.

Update: after some time same story devices pairing but some stop reporting.

Coax88 commented 2 years ago

I it helped for me to change zigbee channel away from other wifi channels, now it work perfect and can pair without any problems

jnxxx commented 2 years ago

I have somewhat similar issues as others seem to. I first used a CC2531 stick, which actually had worked fine. But in December I started to experience issues as more devices were added (~15). Eventually a Sonoff temp and humidity device stopped updating. Although I could re-pair it, it would only send one update then be stuck again on that value.

I figured it was the limitation of the CC2531, and I ordered a CC2652RB which I got yesterday and flashed with Z-Stack_Home_3.x.0_20211217. The Sonoff device is now paired and seem to work again.

However, I now experience problems adding new devices (Aqara door & window contact sensors). I usually do so by permitting join to a specific router. But this does not work and does not seem to log anything. If close enough I can pair it to the coordinator, but that is it. As soon as I try to pair it to a router, it leaves the network and does not get connected to the router.

Otherwise things seem to be communicating as is.

cracrama commented 2 years ago

What i have done so far: Flashed CC2652P Ebyte with new firmware and whereas most devices reconnecting correctly, some (like Sonoff, aquara) none. OK I have reflashed again and same outcome - I have reseted Z2M but nothing. What is interesting it's that it's same device causing problem every time no matter firmware - pairing ok but no reporting or only partial reporting. So I will agree with xrust83 that it cound be global problem. I have tried different dongle - same story - even same type but different device and same story., I need to mention that devices befor flashing firmware were working ok and some of them are brand new.

xrust83 commented 2 years ago

I couldn't beat the CC2531 but was ready before switching to the CC2652p. I started flashing to connect Aqara MCCGQ14LM. After switching to CC2652p, the devices are paired correctly, but the Aqara MCCGQ11LM sensor began to fail. Somewhere in the topics I met the answer that the problem is in the CH340C port driver, something is wrong in updating the OS drivers. Therefore, I temporarily transferred this sensor to the Aqara hub. It is also worth adding that lowering the CC2531 firmware version until the past few days has not repaired the problem with connecting new devices or old remote ones. so I still think that the problem is not in the firmware of the stick, but in the program itself.

Koenkk commented 2 years ago

For users that migrated from a CC2531 to a CC2652 we found the possible issues; keep an eye on https://github.com/Koenkk/zigbee2mqtt/issues/9117#issuecomment-1015409585

cracrama commented 2 years ago

Many thanks for direction its just firmware update of the same chip caused this kind of issues (CC2652) too Seems like even downgrading firmware doesn't fix the issue I can also confirm that switching gateways (Texas instrument / Ebyte ) doesnt change anything. Many thanks for support Gents. I will try more experiments and we will see.

castorw commented 2 years ago

Unfortunately not only those who migrated from CC2531 are affected but rather anyone who restored backup onto Z-Stack 3.x or newer adapter - therefore any CC2652-based adapter.

sonsaeng commented 2 years ago

I'm sorry, same Problem. I use a Raspi stand alone with sonoff Z3 Dongle. Only updated raspian-lite, MQTT, Z2M and Nodered. Every seems to be ok, but no pairing. Sonoff via extender and powerd usb-hub. No zigbeedevice (it would be the first one) will connect.

Tonio16 commented 2 years ago

@sonsaeng Which firmware do you have on the controler ? It has to be the one of December 21 or January 21.

Antoine

sonsaeng commented 2 years ago

I used this: CC1352P2_CC2652P_launchpad_20210120

castorw commented 2 years ago

I believe this is related to #9117. I will be investigating further into this issue as per https://github.com/Koenkk/zigbee2mqtt/issues/9117#issuecomment-1016890665.

sonsaeng commented 2 years ago

I solved my problem in this way:

Now I could pair an IKEA-lamb (router) and a tradfri dimmer (enddevice) :)

Thank you for your help!

cracrama commented 2 years ago

sonsaeng So is this operation required repairing all devices again?

sonsaeng commented 2 years ago

Yes, thats what I have understood. You have to pair them again. I'm no expert in zigbee (I only use it). But what I have seen is, that the data-structure has changed a lot. I think thats the reason why. I know that this means a lot of work. But if things work better than, it`s ok for me.

convicte commented 2 years ago

Dear all,

TL;DR: DO NOT follow @sonsaeng instructions prior to reading this, unless you have easy access to all your endpoint and router devices!!

I've been delinquent on updating this ticket with the outcome of my troubleshooting and finding a final solution to the problem, which is entirely thanks to @Koenkk and his valuable input - Thank you again Sir!

The issue, at least as it manifested itself for my network and what were the root causes of it did not require changing channels, deleting your entire network nor did it require repairing of most devices, but a few most stubborn had to be reset and repaired.

Main issues in question - the devices will not pair anywhere in the network, but will right next to the coordinator: 1) The coordinator_backup.json in your existing network for some reason does not include all the routers (e.g. they were not added over time as you added them to the network, and thus the backup did not bring them into your new coordinator as viable connection points. This results in the pairing failing as the endpoint devices try to use these routers to get to the coordinator. 2) Repairing the routers closest to the coordinator and then following down the network to the furthest allowed me to get sufficient number to heal the whole network and allow pairing from anywhere to anywhere. 3) To accomplish this, you will have to repair routers which form the core of your network and delete the backup file from HA z2m folder prior to removing the coordinator and reflashing it with your firmware of choice. I am running the latest v20211217. Upon repluging the coordinator and lunching z2m the new backup (including your routers) will be loaded onto the coordinator and this issue should be resolved.

Additional issue present in my instance - extended_pan_id in the coordinator_backup.json incorrect or blank: See here for the troubleshooting, and resolution: https://github.com/Koenkk/zigbee2mqtt/issues/10339#issuecomment-1003693981

In short, if you find your ext_pan_id to be incorrect or blank as was my case, the steps to resolve it overlap with the resolution of the main issue above, though if the issue is combined, like it was for me, you will need to reflash twice.

In short:

If you have any questions, I'll do my best to help, but this is mostly what I've learned while working with Koen, so anything more involved may be way above my understanding.

Good luck!!

Fabiancrg commented 2 years ago

On my side, the problem was fixed by just flashing the coordinator with FW 20211217, suddenly a router (sitting far from the coordinator) I was trying to join for months joined a few seconds later after allowing new devices to join, without doing anything else.

But if I look my network, I have a total of 69 devices in which 27 are routers. In the coordinator_backup.json file I have only 35 devices in which 3 are not present on my network and only 12 routers.

@Koenkk is it normal ? Should I try to remove and pair these missing routers ? What should be present in the coordinator_backup.json file ?

To be clear, everything seems to work fine, devices are reporting regularly and working fine, I can pair devices too.

cracrama commented 2 years ago

convicte - Many thanks for instruction However it seems like pairing is not the only issue. Also reporting - devices pairing with no problems but they are not transmitting any data or transmiting only some. Some devices which are not reporting after couple of hours disconnecting from the network. I have not decided yet, but i thinking about reflashing coordinator and deleting backup. I just hope that will fix the problem and not make it worst. Anywa many thanks

convicte commented 2 years ago

@Fabiancrg, from what I understand, ALL the routers should be present in the backup file. If they are not, I believe that the pairing process through this router to the coordinator will not work. I may be wrong, though.

@cracrama, it may be a different manifestation of the same problem. If the device is not able to reach the coordinator and is relying on the router relaying the signal to the coordinator, while this router is not recognized as part of the network (as discussed above) the reporting may be sketchy or impossible. Please confirm how your ext_pan_id looks like and what your backup has listed. If the issues match up with mine, you will probably have to follow the above. Finally, I've had devices do this as well, where they would drop out after some time without reporting, which was related to the above issue.

Based on Koen, this must thankfully only be done once as you transition from Zigbee 1.2 to 3.x, and will not be an issue as the 3.x network starts working correctly. The 3.0 seem to have many more checks and security features which do not allow the looser, potently error laden 1.2 spec networks to transition seamlessly.

cracrama commented 2 years ago

convicte. I have checked it's not pan id problem (mine is "extended_pan_id": "dddddddddddddddd"). And as i have mentioned above i just updated dongle firmware (no migration) but now even when i reflash dongle or use different dongle i'm experiencing same issues. So probably issue is with Z2M backup_json or something. One observation though: problematic device has " tx_counter": 0" So it's seems like it's not talking. I can force remove device and later re-pair but once device became problematic - nothing changes. Relateing to routers - I have correct reporting from all of them so I'm not sure if that is the same issue.

castorw commented 2 years ago

@cracrama Zeroed tx_counter for APS link keys is normal and expected with current Z2M implementation. If rx_counter is non-null it means the device is transmitting APS-encrypted messages but the coordinator is not responding with encrypted messages. Do any other of your devices have tx_counter larger than zero? I can investigate further if you want but I will need you to have a ZB sniffer ready and you can contact me on Telegram (@castorko).

cracrama commented 2 years ago

castorw - Many thanks for response rx_counter is null on all devices, however tx_counter has higher values on all devices except that "problematic" where both positions are null. Regarding ZB sniffer - I will try to organize something if I find time (I'm quite fresh in zigbee). Many thanks anyway.

convicte commented 2 years ago

@cracrama, you seem to be misunderstanding the pan_id issue.

From what you described, you have exactly the same problem as I had. The id of ddddddddddd... is what I referred to as blank, since it should look like the one here in the sniff (EPID) - https://github.com/Koenkk/zigbee2mqtt/issues/10339#issuecomment-1003693981

If the coordinator and the network have different values, you will have exactly the issues you described.