Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.79k stars 1.64k forks source link

Sonoff Zigbee3.0 USB Dongle pairing issues #9117

Closed notenoughtech closed 2 years ago

notenoughtech commented 2 years ago

I got serious pairing issues with various Zigbee 3.0 end devices. Sonoff Zigbee3.0 USB Dongle was using {"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20210120,"transportrev":2},"type":"zStack3x0"} and the latest firmware after a successful flash. With the latest firmware, I was not able to get anything to pair - no MQTT messages about the interview process. So I reverted back to the 20210120 revision and I was able to pair Aqara TVOC and IKEA button, but Sonoff devices or Xiaomi buttons were not even entering the interview stage. The same devices were tested with ZZH and CC2531 and I was able to pair and use the devices.

There isn't much information I can share, as the log wasn't being populated. Happy to try things out

oferwald commented 2 years ago

@Koenkk Thanks for the advice, I tried the disable joining and allowing only coordinator to join, but to no avail. I can't take every router down now, because this means taking electricity out in most of my house. The device keeps "leaving" the network, when I tried a force remove, it said it is not even there, I truly wonder how is it even possible for a device that left to leave again.

I did try deleting the entire docker config directory, this didn't help at all. :(

My next plan is to connect the old C2531 back, allow devices to join, and replace it again, but I will probably only attend to this tomorrow. Some thing is badly broken, maybe I did something, nothing that I am aware of.

I will be happy to provide any debug information that might help. And again, thanks for trying to help.


Updates: I did what I said, and connected the old adapter back, I was able to quickly connect and interview all the devices, including the motion sensor and another boiler switch, and the two routers that didn't come back. This is a very awkward way to pair new devices, but at least it works. I hope that a new version will fix this, because it makes a great product (the sonoff zigbee dongle) far less great than it is, because other then the paring issue it was quite stable and responsive the entire week it was there, unlike the 2531 constant crashes.

Koenkk commented 2 years ago

@oferwald I'm afraid that this cannot be fixed from the z2m side. The problem is that the CC2531 doesn't remember all devices (only 5 since that is the max), these do not end up in the coordinator_backup.json and my expectation is that only devices which are in coordiantor_backup.json can be paired via.

oferwald commented 2 years ago

@Koenkk I am not sure that I follow you, the current coordinator_backup.json I have only includes 3 devices, all of which were paired directly with the new coordinator, and none of which are a router at all (all are sensors). What will happen if I manually add devices to this file? (how to find a key?) What will happen if I delete this file?

Koenkk commented 2 years ago

What will happen if I manually add devices to this file? (how to find a key?)

This won't work, the key is generated when the device joins the network.

What will happen if I delete this file?

nothing, z2m will just regenerate it on next shutdown

oferwald commented 2 years ago

So if I understand correctly, I have a coordinator that is not connected to any router, yet still able to coordinate the network, and this is the reason I can't pair almost anything new, and this is not solvable by z2m. For the time being, I will use my hack to extend the network, its not like I have new devices everyday. But I assume that as this dongle will become more widespread, the issue will become quite common.

Koenkk commented 2 years ago

I have a coordinator that is not connected to any router, yet still able to coordinate the network,

Zigbee doesn't work in such way, devices can communicate with each other when they use the same pan_id, ext_pan_id, channel and network key. So they can communicate, but the coordinator refuses to allow any devices to join since the device is not known in the coordinator.

But I assume that as this dongle will become more widespread, the issue will become quite common.

The issue is not with the dongle itself but with the migration from cc2531 -> sonoff. The cc2531 backup doesn't contain all the devices which is the problem (due to limits of the CC2531). It can be fixed by either repairing all routers or starting from scratch

castorw commented 2 years ago

@oferwald Could you please provide the following:

There is a variety of variables in the backup and restore process. Most likely the issue is not isolated to a specific dongle but rather firmware or firmware combinations.

Hedda commented 2 years ago

I'm not comfortable yet with flashing to another firmware

Tip for upgrading firmware without the dongle enclosure is to follow instructions in -> https://github.com/JelmerT/cc2538-bsl/pull/114

(dowload and use the https://github.com/JelmerT/cc2538-bsl/tree/feature/ITead_Sonoff_Zigbee-delay branch)

Just be sure to backup NVRAM first -> https://github.com/zigpy/zigpy-znp/blob/dev/TOOLS.md#nvram-backup

oferwald commented 2 years ago

Hello @castorw

The requested files are attached in the following file: files.tar.gz

Let me know if anything else might be helpful

osos commented 2 years ago

The issue is not with the dongle itself but with the migration from cc2531 -> sonoff. The cc2531 backup doesn't contain all the devices which is the problem (due to limits of the CC2531). It can be fixed by either repairing all routers or starting from scratch

@Koenkk, do you recommend that all routers are repaired also while migrating from CC2531 -> sonoff, I think that is different from the docs: https://www.zigbee2mqtt.io/guide/faq/#what-does-and-does-not-require-repairing-of-all-devices

I have a few IKEA bulbs which re hard to get access to, and they seem to work. Most troubles are with the remotes on battery (difficult to pair and seem to drain battery).

What should be the best procedure to re-pair the routers?

Koenkk commented 2 years ago

do you recommend that all routers are repaired also while migrating from CC2531 -> sonoff

Lets see what @castorw finds, it would help if you provide the required files: https://github.com/Koenkk/zigbee2mqtt/issues/9117#issuecomment-1008742890

notenoughtech commented 2 years ago

I finally had some time to test the latest firmware, thank you Koen and everyone who has contributed to this. TLDR: 3 days in I have no issues and one instance (*fixed) of sensor pairing but not reporting. More detailed impressions of the master and development branch impressions here: https://notenoughtech.com/home-automation/is-sonoff-zigbee-dongle-plus-finally-ready-for-launch/

osos commented 2 years ago

@notenoughtech thank you for the update. Regarding the flashing of the Sonoff dongle you should have a look at https://github.com/JelmerT/cc2538-bsl/pull/114 with the pull-request in there flashing the dongle is really easy.

notenoughtech commented 2 years ago

I covered this in my vid/article. While convenient for anyone working on the firmware, pressing the button is hardly a chore when I do it once every 6 months or so :) Mat Zolnierczyk

Maker, Robotics Engineer

Owner | NotEnoughTech @.*** notenoughtech.com Teesside, North Yorkshire, UK [image: facebook] https://www.facebook.com/NotEnoughTECH/ [image: twitter] https://twitter.com/NotEnoughTECH [image: linkedin] https://www.linkedin.com/in/mat-zolnierczyk/ [image: instagram] https://www.instagram.com/notenoughtech/

On Mon, Jan 10, 2022 at 9:19 PM osos @.***> wrote:

@notenoughtech https://github.com/notenoughtech thank you for the update. Regarding the flashing of the Sonoff dongle you should have a look at JelmerT/cc2538-bsl#114 https://github.com/JelmerT/cc2538-bsl/pull/114 with the pull-request in there flashing the dongle is really easy.

— Reply to this email directly, view it on GitHub https://github.com/Koenkk/zigbee2mqtt/issues/9117#issuecomment-1009353100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDRL2ZFBPOYB3E47KMSCR3UVNEM3ANCNFSM5FXV5MTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

flatsiedatsie commented 2 years ago

While convenient for anyone working on the firmware, pressing the button is hardly a chore when I do it once every 6 months or so

I think you overestimate my mother's skills :-)

notenoughtech commented 2 years ago

I think you overestimate my mother's skills :-)

Shouldn't you explain how dangerous the use of technology is? haha! That's fair play and kudos to her sense of adventure!

Hedda commented 2 years ago

While convenient for anyone working on the firmware, pressing the button is hardly a chore when I do it once every 6 months or so :)

Surely it is a chore to unscrew the enclosure even once just to upgrade the firmware when the Auto-BSL feature makes it so easy ;)

Not all feature "Auto-BSL" https://github.com/Koenkk/Z-Stack-firmware/blob/master/coordinator/Z-Stack_3.x.0/bin/README.md

notenoughtech commented 2 years ago

What do you mean? I have a hole in mine now. 😁😁😁 Hardware fix to a software problem 😉😉😉

On Tue, 11 Jan 2022, 12:12 Hedda, @.***> wrote:

While convenient for anyone working on the firmware, pressing the button is hardly a chore when I do it once every 6 months or so :)

Surly it's a chore to unscrew the enclosure even once just to upgrade the firmware when the Auto-BSL feature makes it so easy ;)

Not all feature "Auto-BSL" https://github.com/Koenkk/Z-Stack-firmware/blob/master/coordinator/Z-Stack_3.x.0/bin/README.md

— Reply to this email directly, view it on GitHub https://github.com/Koenkk/zigbee2mqtt/issues/9117#issuecomment-1009905814, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDRL22E2MVXM3DXQHP2YXTUVQNDHANCNFSM5FXV5MTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

Hedda commented 2 years ago

By the way, someone now claim that Auto-BSL also works with Sonoff in the ZigStar GW Multi tool too -> https://github.com/xyzroe/ZigStarGW-MT/issues/2

notenoughtech commented 2 years ago

I have 5 decent coordinators now. Any ideas for fair tests that don't require RF equipment? I can use cc2531 for sniffing while testing but that's about it

flatsiedatsie commented 2 years ago

@notenoughtech Log the signal strength in a smart home system of a fee devices around the home, and see how that value changes with the different coordinators? I recommend the Webthings Gateway (I'm updating the Zigbee2MQTT addon soon, lots of nice new features).

notenoughtech commented 2 years ago

That's kinda what I've done in my original article: image

Koenkk commented 2 years ago

Please keep this issue on topic, otherwise it gets harder to track this. If you want to have a discussion please create a discussion for it: https://github.com/Koenkk/zigbee2mqtt/discussions

flatsiedatsie commented 2 years ago

Back on topic: are people seeing any issues from having it plugged directly into their Raspberry Pi? Mine is working fine directly in a Pi 3, but I haven't tested it with a Pi 4 yet (which are said to cause more interference).

migube commented 2 years ago

usb 2 or 3 ports on pi4 might make a difference, or behind usb2.0 hub

TeslaCLT commented 2 years ago

Oh, you mean the dev branch of z2m. I misunderstood/misread and thought you meant the dev branch of the Sonoff firmware. I may indeed look into the dev branch of z2m then.

FYI, a day later, I still have the same pairing issues. I have to turn on debug mode to pair anything. So hopefully, as you said, the newer release will clear things up.

Given the last few posts, I should mention that in my issue (requiring herdsman in debug mode to successfully pair devices), I have my Sonoff dongle plugged into a usb2 port on my thin client. Should I plug it into a usb3 port instead? I didn’t think throughput would need to be higher than usb2, but perhaps there are other considerations to make?

Hedda commented 2 years ago

Back on topic: are people seeing any issues from having it plugged directly into their Raspberry Pi? Mine is working fine directly in a Pi 3, but I haven't tested it with a Pi 4 yet (which are said to cause more interference).

Yes any USB 3.0 port is known to cause EMF interference with all Zigbee adapters because it is also operating on 2.4GHz frequency, and the same problem can be seen with Wi-Fi USB adapters and Bluetooth USB adapters, so not only with Zigbee adapters. See:

https://www.reddit.com/r/technology/comments/136g7y/usb_30_has_been_found_to_cause_interference_that/

This is not only an issue with Raspberry Pi 4 but really on all type of computers with USB 3.0 ports. See report on usb.org by Intel:

https://www.usb.org/sites/default/files/327216.pdf

Note that there is not only an issue with plugging a Zigbee dongle directly into the USB 3.0 port itself but also leaving a Zigbee dongle close to other unshielded cables/wires or unshielded devices that are connected to USB 3.0 ports, (especially infamous known are connecting unshielded USB 3.0 SSD or harddrives with USB 3.0 bridge/converter PCB to a USB 3.0 port and leaving the cables/wires for those or the USB 3.0 device itself close to the Zigbee dongle antenna). With unshielded I mean in for example a USB harddrive case with a plastic enclosure instead of inside a full metal enclosure which will act as electromagnetic shielding for the USB 3.0 bridge/converter PCB that is likely a strong source emitting interference.

Therefore the general recommendation is to use a USB 2.0 port or if use USB 3.0 port then plug in a USB 2.0 hub first and connect the Zigbee adapter into the USB 2.0 hub instead.

Should I plug it into a usb3 port instead?

No! You should absolutely use it in a USB 2.0 if possible, and preferably always also connect it via a long USB extension cable for it too in order to get it away from both the computer and its power supply as well as away from any other appliances and cables. Ex:

https://itead.cc/product/1-5m-usb-male-to-female-extension-cable/

Or even better a shielded USB extension cable (electromagnetic shielded cables usually have double/extra layer of EMF shielding):

https://www.google.com/search?q=shielded+USB+extension+cable

image

castorw commented 2 years ago

@oferwald I have reviewed the attached files:

Theory that comes in mind is that after upgrade from Z-Stack 1.2 to 3/4 the coordinator tries to negotiate APS link keys and screws up somehow. I will need to do some more testing on this matter.

Hedda commented 2 years ago

@oferwald I have reviewed the attached files:

  • Did you pair any extra devices to the new coordinator - I see no correlation in devices between the backups. However there are som extra devices with APS link keys.
  • The original coordinator was running Z-Stack 1.2 - therefore no APS link keys could have been negotiated before.
  • Are you experiencing pairing problems after re-pairing all routers?

Theory that comes in mind is that after upgrade from Z-Stack 1.2 to 3/4 the coordinator tries to negotiate APS link keys and screws up somehow. I will need to do some more testing on this matter.

Tip FYI, zigpy-znp developers also seen some issues when backup Z-Stack Home 1.2 dongle and restored to Z-Stack 3.0, see:

https://github.com/zigpy/zigpy-znp/pull/92

and

https://github.com/zigpy/zigpy-znp/issues/70

oferwald commented 2 years ago

@castorw I paired exactly three devices to the new coordinator, two water leak sensors, and a temp sensor that stopped working with the previous coordinator.

The CC2531 was naturally 1.2, the source routing version

For the time being, I didn't re-pair any routers, I have something like 25 routers, most of them are switches and cover switches, and taking them out is only by taking some power off my house. And I am not sure that I know exactly how to force a reset for each of them. I did try to pair a new router about 2cm from the new coordinator and no pairing happened at all (no interview), so I waited with this more tedious process for your inputs, mainly in the hope this process won't be needed. :)

Let me know if I can provide any further information, or if you want me to test anything

thanks for your help!

castorw commented 2 years ago

@Hedda Possibly nice catch - https://github.com/zigpy/zigpy-znp/pull/92/files - this might be the issue. I will investigate further and let you know.

castorw commented 2 years ago

@Koenkk @oferwald @Hedda I have confirmed that ZH's current backup restore mechanism does indeed create this corruption as described in https://github.com/zigpy/zigpy-znp/pull/92. I have fixed the data structure locally and will get to creating a full fix PR soon (a few days at most).

The fix also needs to address problems in current setups where the migration corrupted the address manager table, so there will be an address manager table check on startup which will fix the table entries if necessary.

I will keep this issue updated as well.

marcelrv commented 2 years ago

I had great difficulties paiting Xiaomi Aquera devices with the stock firmware. Also when pairing was done half, or in another app, the only way to re-initiate the pairing seemed to be to reflash the device with new mac

I finally managed to pair the devices with the development branch version: CC1352P2_CC2652P_launchpad_coordinator_20220103.zip

So far seems to be working fine. Pairing took a lot of time, and only finished after 20 or so times clicking the buttons shortly to finish the process. Will continue monitoring if anything locks up after a while.

notenoughtech commented 2 years ago

Did you have any log errors? I found sonoff zb temp sensors giving me bind timeouts. To successfully pair these I had to turn join off, reset sensor and wait 2min. Then pair it again. For some reason it helps to issue join true command on stubborn sensors (I'm not sure what's the logic behind is or a sheer coincidence but as soon as I trigger join again ) it completes the interview process.

On Mon, 17 Jan 2022, 08:07 Marcel, @.***> wrote:

I had great difficulties paiting Xiaomi Aquera devices with the stock firmware. I finally managed to pair the devices with the development branch version: CC1352P2_CC2652P_launchpad_coordinator_20220103.zip

So far seems to be working fine. Pairing took a lot of time, and only finished after 20 or so times clicking the buttons shortly to finish the process. Will continue monitoring if anything locks up after a while.

— Reply to this email directly, view it on GitHub https://github.com/Koenkk/zigbee2mqtt/issues/9117#issuecomment-1014241126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKDRL2ZRBLZMDOKJ36PKDSLUWPE4JANCNFSM5FXV5MTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

Hedda commented 2 years ago

Tip FYI, zigpy-znp developers also seen some issues when backup Z-Stack Home 1.2 dongle and restored to Z-Stack 3.0, see:

zigpy/zigpy-znp#92

and

zigpy/zigpy-znp#70

@castorw another tip is to also be aware of this other maybe related backup and restore migration issue -> https://github.com/zigpy/zigpy-znp/pull/120

castorw commented 2 years ago

I have created a PR addressing this issue (https://github.com/Koenkk/zigbee-herdsman/pull/495). @Koenkk @oferwald Can you please test if it fixes the issue before I finish the PR by updating and adding tests?

oferwald commented 2 years ago

Hi @castorw,

Can you provide me with instructions on how to test this? I am using the latest-dev docker

Thanks

castorw commented 2 years ago

Hi @castorw,

Can you provide me with instructions on how to test this? I am using the latest-dev docker

Thanks

@Koenkk could you please provide instructions?

Koenkk commented 2 years ago

For Docker

docker exec -it zigbee2mqtt sh # replace zigbee2mqtt with the container name
cd node_modules
rm -rf zigbee-herdsman
apk add make gcc g++ python3 linux-headers git
git clone https://github.com/castorw/zigbee-herdsman.git -b znp-fix-addrmgr-empty
cd zigbee-herdsman
npm ci
npm run build
exit
docker restart zigbee2mqtt # replace zigbee2mqtt with the container name
oferwald commented 2 years ago

Thanks for the instructions,

I guess I am running the following version now: Zigbee2MQTT:info 2022-01-18 18:36:11: Starting Zigbee2MQTT version 1.22.2 (commit #414c51f)

Anyhow, there is no change to coordinator backup, what would you like me to do (except trying to pair new devices or attempt to re-pair an existing one? need any log info?)

LMK

castorw commented 2 years ago

@oferwald What would be great if you were able to run Z2M with DEBUG=zigbee-herdsman:adapter:zStack:startup* so we can verify it works properly - you should see message stating (#9117) verifying address manager table for post-migration corruption. And then another one saying it either fixed or didn't fix AMT entries.

Afterwards try pairing a device and check if it works. If it does not, please try unplugging the adapter and restarting Z2M and please let me know of the steps you took and whether it works.

Thanks.

oferwald commented 2 years ago

@castorw

No such line in my logs, I did start with debug and have tons of output though: debug 2022-01-18 20:06:57: Loaded state from file /app/data/state.json info 2022-01-18 20:06:57: Logging to console and directory: '/app/data/log/2022-01-18.20-06-56' filename: log.txt debug 2022-01-18 20:06:57: Removing old log directory '/app/data/log/2022-01-09.17-22-16' info 2022-01-18 20:06:57: Starting Zigbee2MQTT version 1.22.2 (commit #414c51f) info 2022-01-18 20:06:57: Starting zigbee-herdsman (0.13.188) debug 2022-01-18 20:06:57: Using zigbee-herdsman with settings: '{"adapter":{"concurrent":null,"delay":null,"disableLED":false},"backupPath":"/app/data/coordinator_backup.json","dat abaseBackupPath":"/app/data/database.db.backup","databasePath":"/app/data/database.db","network":{"channelList":[11],"extendedPanID":[221,221,221,221,221,221,221,221],"networkKey":" HIDDEN","panID":6754},"serialPort":{"path":"/dev/ttyUSB1"}}' info 2022-01-18 20:06:58: zigbee-herdsman started (resumed) info 2022-01-18 20:06:58: Coordinator firmware version: '{"meta":{"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20211217,"transportrev":2},"type":"zStack3x0"}' debug 2022-01-18 20:06:58: Zigbee network parameters: {"channel":11,"extendedPanID":"","panID":6754}


Waiting for further instructions

castorw commented 2 years ago

@oferwald This output does not seem to include DEBUG output. Please try and refer to this https://www.zigbee2mqtt.io/guide/usage/debug.html#enabling-logging.

oferwald commented 2 years ago

Hi @castorw

I thought that the lines starting with debug were indicating that it is using debug, It was started with the following command: docker run -it --name=zigbee2mqtt -v /root/docker/zigbee2mqtt:/app/data --device=/dev/ttyUSB1 -e TZ=Asia/Jerusalem -v /run/udev:/run/udev:ro --privileged=true -e DEBUG=zigbee-herdsman:adapter:zStack:startup* koenkk/zigbee2mqtt:latest

Also, the log file size is about 30x what it previously was. including many starting with "debug"

I will be glad to provide you with the full log if needed, but I guess is that something is still a miss.

castorw commented 2 years ago

Hi @oferwald, please provide the log or at least lookup the line I referenced above so we can confirm you are running the dev ZH from my repo :-)

To the topic of logging, there are 2 logging mechanisms:

oferwald commented 2 years ago

Hello again, @castorw

I was to blame for some, as I killed the docker and started it again, which made all the changes go away instead of restarting it, but I did it again, and now the log shows: debug 2022-01-19 15:27:02: Loaded state from file /app/data/state.json info 2022-01-19 15:27:02: Logging to console and directory: '/app/data/log/2022-01-19.15-27-01' filename: log.txt debug 2022-01-19 15:27:02: Removing old log directory '/app/data/log/2022-01-10.17-40-26' info 2022-01-19 15:27:02: Starting Zigbee2MQTT version 1.22.2 (commit #414c51f) info 2022-01-19 15:27:02: Starting zigbee-herdsman (0.13.194)

(notice the changed zignee-herdsman version) However, this is still not working, nothing that I see in my log, I did get to shell and your code is there, in fixup.js and some other locations

If there is a way to provide you with the full log privately, just let me know

castorw commented 2 years ago

@oferwald Okay so lets presume it worked okay. Can you please provide me with NV dump like you did before - that way I can confirm the fix did what it was supposed to. Also did you try pairing devices to see if the problem has been resolved?

oferwald commented 2 years ago

@castorw file is attached, will try some re-pairing too and will report back in a few minutes files2.tar.gz

castorw commented 2 years ago

@oferwald According to the dump the AMT is fixed, so please try if it works ;-)

Also I kinda omitted the last sentence in your previous message - you can find me on Telegram (@castorko).

gvt70 commented 2 years ago

Hello All, Above I have read some comments about having to take off the antenna. I have same problem: https://github.com/Koenkk/zigbee2mqtt/issues/10894 Is this the same problem?