Koenkk / zigbee2mqtt

Zigbee šŸ to MQTT bridge šŸŒ‰, get rid of your proprietary Zigbee bridges šŸ”Ø
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.16k stars 1.68k forks source link

Vendor independent pairing results in Loop #24314

Closed theimo1221 closed 1 hour ago

theimo1221 commented 1 month ago

What happened?

Trying to pair a TUYA TS0202_1 resulted in the device immediatly leaving network after configuration and restarting interview. Additionally only the datapoints link-quality and battery percentage receive data.

I tried switching to Dev as recently @Koenkk added some changes regarding Tuya within the converters

What did you expect to happen?

Device pairing as usual.

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.40.2-dev commit: db00759a

Adapter firmware version

20240710

Adapter

SONOFF Zigbee 3.0 USB Dongle Plus

Setup

Linux Container within Proxmox

Debug log

The device in question can be found using 0xa4c138813f7e5a83

In my opinion the following path looks concerning but should be okay, as commented within herdsman ("is nice to have")

[2024-10-13 15:57:04] debug:    z2m: Received Zigbee message from '0xa4c138813f7e5a83', type 'readResponse', cluster 'genBasic', data '{}' from endpoint 1 with groupID 0
[2024-10-13 15:57:04] debug:    z2m: Skipping message, still interviewing
[2024-10-13 15:57:04] info:     z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/0xa4c138813f7e5a83', payload '{"last_seen":"2024-10-13T15:57:04+02:00","linkquality":142}'
[2024-10-13 15:57:04] debug:    zh:controller:endpoint: Error: ZCL command 0xa4c138813f7e5a83/1 genBasic.read(["swBuildId"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"reservedBits":0,"writeUndiv":false,"sendPolicy":"immediate"}) failed (Status 'UNSUPPORTED_ATTRIBUTE')
    at Endpoint.checkStatus (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:350:28)
    at Endpoint.zclCommand (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:956:26)
    at Endpoint.read (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:446:29)
    at Device.interviewInternal (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/controller/model/device.ts:937:42)
    at Device.interview (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/controller/model/device.ts:753:13)
    at Controller.onDeviceJoined (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/controller/controller.ts:753:17)
[2024-10-13 15:57:04] debug:    zh:controller:device: Interview - failed to read attribute 'softwareBuildID' from endpoint '1' (Error: ZCL command 0xa4c138813f7e5a83/1 genBasic.read(["swBuildId"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"reservedBits":0,"writeUndiv":false,"sendPolicy":"immediate"}) failed (Status 'UNSUPPORTED_ATTRIBUTE'))
[2024-10-13 15:57:04] debug:    zh:controller:device: Interview - IAS - enrolling '0xa4c138813f7e5a83' endpoint '1'

anonymized.log

theimo1221 commented 1 month ago

The device was added to zigbee2mqtt with this pr: https://github.com/Koenkk/zigbee-herdsman-converters/pull/4523

theimo1221 commented 1 month ago

It seems the special tuya wait (see https://github.com/Koenkk/zigbee2mqtt/issues/5814 ) doesn't work for this as the manufacturer ID is 4417 instead of 4619

Edit: Even with adding that ID to that exception list the device leaves immediatly: anonymized2.log

theimo1221 commented 1 month ago

Tried it with another device, which results in the same and here the values don't update too:

image
Koenkk commented 1 month ago

Could you check if the issue is fixed with the following external converter:

If this doesn't work, I'm afraid we have to sniff traffic with the original gateway and compare it with the z2m one. Seems some magic is needed. https://www.zigbee2mqtt.io/advanced/zigbee/04_sniff_zigbee_traffic.html

theimo1221 commented 1 month ago

Thanks for the response, but the converter dosn't seem to work, or doesn't set description to CUSTOM:

  1. The file is added: image
  2. It is added within configuration.yaml: image
  3. I restarted the systemctl
  4. I removed the previously (unresponsive) paired device
  5. I restarted device pairing
  6. Device dosn't say CUSTOM: image
Koenkk commented 1 month ago

I forgot to add custom to the description, added it now (https://gist.github.com/Koenkk/b5a47072d5c5f32b6a21cae6c2d50d3a)

theimo1221 commented 1 month ago

Thanks for the update:

  1. The converter itself is applied: image

  2. Device still tries to join twice:

    info 2024-10-17 00:32:41z2m: Device '0xa4c13829bd146024' is supported, identified as: Tuya Motion sensor CUSTOM (TS0202_1)
    info 2024-10-17 00:32:41z2m: Configuring '0xa4c13829bd146024'
    info 2024-10-17 00:32:41z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"definition":{"description":"Motion sensor CUSTOM","exposes":[{"access":1,"description":"Indicates whether the device detected occupancy","label":"Occupancy","name":"occupancy","property":"occupancy","type":"binary","value_off":false,"value_on":true},{"access":1,"category":"diagnostic","description":"Indicates if the battery of this device is almost empty","label":"Battery low","name":"battery_low","property":"battery_low","type":"binary","value_off":false,"value_on":true},{"access":1,"category":"diagnostic","description":"Link quality (signal strength)","label":"Linkquality","name":"linkquality","property":"linkquality","type":"numeric","unit":"lqi","value_max":255,"value_min":0},{"access":1,"category":"diagnostic","description":"Remaining battery in %, can take up to 24 hours before reported","label":"Battery","name":"battery","property":"battery","type":"numeric","unit":"%","value_max":100,"value_min":0},{"access":1,"category":"diagnostic","description":"Voltage of the battery in millivolts","label":"Voltage","name":"voltage","property":"voltage","type":"numeric","unit":"mV"}],"model":"TS0202_1","options":[{"access":2,"description":"Time in seconds after which occupancy is cleared after detecting it (default 90 seconds).","label":"Occupancy timeout","name":"occupancy_timeout","property":"occupancy_timeout","type":"numeric","value_min":0}],"supports_ota":false,"vendor":"Tuya"},"friendly_name":"0xa4c13829bd146024","ieee_address":"0xa4c13829bd146024","status":"successful","supported":true},"type":"device_interview"}'
    info 2024-10-17 00:32:41z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/0xa4c13829bd146024', payload '{"last_seen":"2024-10-17T00:32:41+02:00","linkquality":36}'
    info 2024-10-17 00:32:41z2m: Successfully configured '0xa4c13829bd146024'
    info 2024-10-17 00:32:46z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/0xa4c13829bd146024', payload '{"last_seen":"2024-10-17T00:32:46+02:00","linkquality":36}'
    info 2024-10-17 00:32:46z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/0xa4c13829bd146024', payload '{"battery":100,"last_seen":"2024-10-17T00:32:46+02:00","linkquality":36,"voltage":3000}'
    info 2024-10-17 00:32:46z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/0xa4c13829bd146024', payload '{"last_seen":"2024-10-17T00:32:46+02:00","linkquality":36}'
    info 2024-10-17 00:32:47z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/0xa4c13829bd146024', payload '{"last_seen":"2024-10-17T00:32:47+02:00","linkquality":36}'
    info 2024-10-17 00:32:54z2m: Accepting joining not in blocklist device '0xa4c13829bd146024'
  3. Device doesn't report any updates neither does occupancy show up image

  4. The 2nd device of this type I added doesn't update as well, but shows occupancy and not voltage: image

theimo1221 commented 1 month ago

@Koenkk might it be a general problem regarding TUYA devices and my network, as I have the same problem with:

image

With it's Zigbee manufacturer being _TZE200_j7sgd8po

From that device I have 3 units, one of which works beautifuly and today arrived the 3rd one as we thought the first unit being a faulty one.... But as displayed in the screenshot above the new arrived device still fails and shows that awkward interview loop: log_2024_17_10.txt

Additionally I tried downgrading to d3de77da13a391d717a773e33197c0b851227ac5 but the issue remains the same

With that Soda S8 I was able to set some values on the device (e.g. turn off handle turn sound), so "some communication" between z2m and the device is indeed working, it just seems to not negotiate a proper network join/configure/bind....

theimo1221 commented 4 weeks ago

@Koenkk Today I needed to change an Ubisys J1 Actor (I have like 10 more of those) and had the same problems, so this doesn't seem to be TUYA specific, it might/has to be my network. Will try changing my adapter...

EDIT: I took my backup Sonos Dongle and flashed it with CC1352P2_CC2652P_launchpad_coordinator_20240315.hex but the issues remain the same (both with the otherwise working device type Ubisys J1 and that Tuya Motion Sensor)

EDIT2: I tried downgrading to 1.39 but that fails at startup with a memory issue. Additionally I downgraded to 1.40 which doesn't change anything...

Anything I can provide you to further proceed with this? I have no issues with the existing devices (setting light, move shutter, reacting on motions, etc.) so my network itself "seems to work" only pairing new devices doesn't result in them staying in my network as they keep joining/leaving

theimo1221 commented 4 weeks ago

It seems like my coordinator could be having some trouble in regards to my main router, as some of those routes are listed as 2/98, 5/52, 44/242, 22/217, 1/166 but this might be related to us switching off shutter group fuse due to maintenance: image

EDIT: I did try to "repair" my network by turning off zigbee2mqtt container and unplugging usb stick for 30 minutes, which did improve some routes, but the Joining --> Leaving --> Joining keeps happening. Especially with the single-phase powered Ubisys J1 the loop is like infinite as the device is constantly ready to join and doesn't go in hibernate/sleep

I additionally ordered a zigbee sonoff e dongle, to switch to ember firmware, to widen the scope of our investigation. Repairing above 190 devices isn't an option for me :(

theimo1221 commented 4 weeks ago

This behaviour seems to be the same as reported here: https://github.com/Koenkk/Z-Stack-firmware/discussions/505#discussioncomment-10881824

theimo1221 commented 3 weeks ago

Just noticed I had 3 routers which were marked in the missing_routers list. I did remove (and partially repaired) those devices getting the missing_routers_counter to zero. Afterwards I was able to pair the Ubisys J1 shutter actuator which always failed yesterday by leaving the network. The SODA S8 still fails (Leaving --> Rejoining loop) so while the situation got better it still isn't resolved...

Koenkk commented 3 weeks ago

Do you have a sniffer, if yes, could you sniff the network when trying to pair and it fails to pair? https://www.zigbee2mqtt.io/advanced/zigbee/04_sniff_zigbee_traffic.html

theimo1221 commented 3 weeks ago

I've ordered an ember based stick, which should arrive later today, so I can perform that sniff more easily. Additionally I created a private repo to share the result with you, without exposing my to much data here to too many people.

I'll inform you once I have done the sniff and recorded that pcab. Thanks for your help šŸ‘

theimo1221 commented 2 weeks ago

Coming back from my business trip, the sticks have arrived.

  1. I changed my adapter from Sonoff-Dongle-Plus to LOAMLIN SMLIGHT SLZB-07p7 --> Issue is still the same
  2. I did an update to the latest dev branch version (including your package updates) --> Issue is still the same
  3. I'll perform the sniff tomorow

Additionally I could today experience the issue with a Ubisys S1 which akwardly doesn't report "Device left the network" yet continues to constantly join the network.

theimo1221 commented 2 weeks ago

@Koenkk I did perform 2 sniffs (see https://github.com/theimo1221/zigbee-snif-result ) during both sniffs I experienced some red error in the console output, thus I'm not sure if the result is usable and/or I have to redo it with a different setup? image

Koenkk commented 2 weeks ago

Can you open these files with Wireshark? It fails for me:

Screenshot 2024-11-01 at 08 59 16
theimo1221 commented 2 weeks ago

Sorry wireshark wasn't installed on my system, so I hadn't checked... Unfortunately the files seem to be corrupt for me as well... I did follow the guide written here: https://github.com/Nerivec/ember-zli/wiki/Sniff#start-sniffing do I have to do some other stuff/configuration in beforehand?

Koenkk commented 2 weeks ago

@Nerivec are you able to open the files produced by ember-zli with Wireshark?

Nerivec commented 2 weeks ago

Yes no problem here (just checked again to be sure).

theimo1221 commented 2 weeks ago

I just did a short scan without performing any joining of devices and the sniff could be opened both on the machine performing the sniff and my mac. So there must have been either a problem during last download of those results to my machine, or the sniffing might became corrupt by those errors, as posted in the image above: image

I'll rerun the pairing tests/sniffs tomorrow morning when the house/it's people are still asleep thus reducing the noise in the network.

Nerivec commented 1 week ago

šŸ‘ What's the model of the emberznet adapter you are using for sniffing? Strange error to see in this context. Do you have some very spammy devices on the network?

theimo1221 commented 1 week ago

@Nerivec The sniff is performed using SONOFF ZBDongle-E with EFR32MG21.

We are not yet sure what's wrong with my network... "Normal" time-critical usage like "turning on an actuator for movement" has almost no delay despite it's many hops (Motion Sensor --> Zigbee Adapter --> Usb2Ethernet --> Zigbee2Mqtt --> Mqtt to IoBroker --> Websocket (Hoffmation-Base) --> EventHandling (Hoffmation-Base) --> Websocket "Turn light on" --> Iobroker to Zigbee2Mqtt --> Command To Adapter --> Usb2Ethernet (Adapter) --> Command received by the lamp. But somehow Interviewing new devices doesn't work flawlessly like in the past.

Yes I have a fairly big Wifi-Setup but those are set to channel: 1, 3, 13 to not collide with zigbee channel 19 Additionally I reduced my Hm-IP Network to roughly 15 devices.

@Koenkk I have good and bad news:

  1. Good: The new Sniff for TUYA_TS0202_1 worked and is online https://github.com/theimo1221/zigbee-snif-result --> The device joined twice and is not sending any updates on movement
  2. Good/Bad: Trying to sniff the UBISIYS S1 we got ourselves a sucessfull join. So whilst I'm happy to have that device working now, I can't provide you a sniff result showing that awkward constant joining behaviour of this router device.
Nerivec commented 1 week ago

Looks like the Tuya is in a pairing loop because it doesn't get the TCLK. I wonder:

theimo1221 commented 1 week ago

I'm using channel 19 for over 4 years and the issues regarding pairing loop during joining started only recently (I haven't added new device for almost 3 month so it might have something to do with the newer zStack FW, my environment or with some herdsman/z2m updates).

Besides in the past I added Ubisys J1 and S1 even with 220+ devices so at least some former version didn't had that issue you mentioned regarding table size (I switched from zigbee TRY to Shelly Try [Wifi] which reduced network by ~20 devices). This is the 4-5th S1 and I have at least 10 J1. Besides my network has (in my opinion) a quite decent amount of router as a typical room in my house has the following zigbee router devices:

  1. Main light
  2. Shutter (almost all rooms)
  3. 2nd/ambient light (most rooms either using a plug or a LED Device)

And these battery driven devices:

  1. Temperature Sensor (including humidity)
  2. Motion sensor
  3. Fire detector

My adapter is located in the center of the house (main living room in 1st Floor) and especially that room has a high density of router: 3 plugs, 3 LED bands, 2 light actuators, whilst having just some battery devices (2 motion sensor, 1 temp sensor, 3 fire sensor).

I'll later try again to add another Soda S8 while sniffing as I have 2 of those which didn't want to pair but also 1 which paired sucessfully...

Koenkk commented 1 week ago

Can you try with this fw? 99241103_dongle_p.hex.zip

I bumped the TC link key table to 220 from 200.

theimo1221 commented 1 week ago

Thanks @Koenkk, just to mitigate possible misunderstanding from my side: that FW is for SONOFF-Dongle-P, correct? If so I'll switch back from LOAMLIN SMLIGHT SLZB-07p7 to the Sonoff one tomorow, as I don't want to risk to long downtime during evening (family...)

Additionally I uploaded another sniff result for Soda S8, but during that sniff again those red error messages popped up again, but the resulting .pcap can be opened using wireshark.

theimo1221 commented 1 week ago

@Koenkk I switched to your FW with the Sonoff, but I'm still unable to pair the Tuya and the Soda device, so that limit increase from 200 to 220 might be a piece in the overall puzzle but alone it unfortunately isn't the solution...

Koenkk commented 1 week ago

It seems the coordinator does not reply to the "Request key" (from the joining device), I guess the device stops trying after sometime and then leaves.

Screenshot 2024-11-06 at 21 53 10

Does this also happen when joining it directly through the coordinator? (permit joining via the coordinator only on the frontend)

theimo1221 commented 1 week ago

You seem to be on the correct track, as pairing with force through coordinator did result in the TS0202_1 being paired correctly and now sending correct state changes for occupancy. I'll later try with the Soda S8 as well, but anyways I still hope we can find a solution to bring back my network to "normal" meaning ideally I'd like to pair devices at their destined position in the house which isn't always reachable for the coordinator, correct?

Nerivec commented 1 week ago

@theimo1221 A suggestion. I released new router firmware builds for the Silabs adapters, which include one for the Sonoff Dongle-E. You could try pairing the device to that one specifically (while the Dongle-E is paired to the coordinator to avoid the issue). It has the benefit that you can see what is going on on the Dongle-E if you plug it into a computer and monitor the serial (vscode extension, or any plain serial-able terminal) while this is happening. If it fails as usual, you may just get a piece of information we're missing since we'll see "what the router sees", which appears to be where something is going wrong. Make sure to save the output (it can be quite verbose). I'll see about translating the short-hand Silabs uses (some of it is not very easy to read/understand). https://github.com/Nerivec/silabs-firmware-builder/releases/tag/v2024.6.2-update2

Note: you can simply revert to NCP once you're done if you prefer to keep it as a coordinator.

theimo1221 commented 1 week ago

I'll later try with the Soda S8 as well

The Soda S8 could be interviewed/configured this way as well. But after pairing it in my hands I had to mount it (needed removal of batteries) and first device didn't rejoin, but after a restart of z2m and device reset I was able to rejoin....

You could try pairing the device to that one specifically (while the Dongle-E is paired to the coordinator to avoid the issue).

@Nerivec Just to get your idea straight:

  1. Flashing the SONOFF Dongle-E with that FW
  2. Plugging the SONOFF Dongle-E with an USB-Extension cable in the USB-Switch in first floor (as Server-Basement would be to far away)
  3. Pairing the SONOFF Dongle-E with my zigbee network coordinated by SONOFF Dongle-P
  4. Mount the SONOFF Dongle-E to a machine with a serial terminal/output
  5. In z2m allow joining only through SONOFF Dongle-E
  6. Try to interview a new device
Nerivec commented 1 week ago

Make sure you have logs in Z2M set to debug first.

  1. Flash Dongle-E with router firmware (ember-zli can do it, so should most flashers)
  2. Start monitoring Dongle-E. If you can pick a monitoring software with timestamping, that's even better, the serial monitor extension in vscode can do it for example: vscode-serialmon
  3. Ensure the monitoring is working, you should already see plenty of output (NWK Steering indicating join attempts every 10 seconds)
  4. Ensure you are capturing the logs from the monitor to a file for later review (or make sure the terminal has a great big history, so you can copy it all once done...)

Then you have two ways to test this:

  1. Using the Dongle-E to see what is happening with the Soda S8 or Tuya device: i. Pair Dongle-E directly to coordinator (permit join 'Coordinator') ii. Pair the misbehaving device directly to Dongle-E (permit join 'Dongle-E' -whatever name you give it-)
  2. If the Dongle-E fails regular joins like the others, using the Dongle-E to see what is happening when the join fails: i. Pair Dongle-E like you did the others (permit join 'All')

Then upload both the logs from Z2M, and the ones from the Dongle-E, to the repo, I'll check, see if I can make sense of what is happening. Ideally, record the times of the tests, so I can go through the logs faster.

theimo1221 commented 6 days ago

Good morning @Nerivec

I have good and bad news: Using both a "failed" SODA S8 and a TUYA TS0202_1, which both haven't been paired using above mentioned workaround of only joining through coordinator, I was able to join both flawlessly using the Sonoff Dongle E with your router FW. The respecting logs and timestamp/macadresses are uploaded here: https://github.com/theimo1221/zigbee-snif-result

But as it succeeded we might not be able to extract any insights....

To conclude the current findings in my network:

  1. Activating 'Permit join' through all results in a coin-flip with low winning probability to join the device successfully due to device not getting the TCLK
  2. Joining through coordinator works
  3. Joining through Zigbee Dongle E Router works
  4. Joining through Router xyz ?????

I'll later on try to join another test device using some other router, to narrow the issue down as I have 8 more new TUYA TS0202_1 which I can join for testing purposes.

theimo1221 commented 10 hours ago

With some more of those Sonoff-Dongle-E yesterday I was able to deploy 7 new Soda S8 flawlessly. Today the first 5 worked as well, but now I can't add any further one.... So @Koenkk the 200/220 limit might be an issue... Would you mind creating a version with 250 for me, as I removed some sensors I don't necessarily rely on and would like to test your point further.

Best regards Thiemo

theimo1221 commented 1 hour ago

The issue is now resolved by splitting my network in two (instead of one networks spanning 4 levels of a house, now 1 network covers basement and groundfloor and 1 network covers 1st and 2nd floor) on different zigbee channels.

The main reason for this issue seems to be 2 parts:

  1. Nearing the limit of around 200/220 devices.
  2. Having too much depth in routers, which can result in certain acknowledgments being lost in void due to "bad/poor" routers being a hop.

Both of this is neither an issue of Zigbee2Mqtt or ZigbeeHerdsman.

Thanks for your great support and work @Koenkk and @Nerivec šŸ‘