Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.36k stars 1.69k forks source link

Broadcasts failing on ember after migration #22453

Closed julien-billaud closed 2 months ago

julien-billaud commented 7 months ago

What happened?

While I've never been facing any issues for more than a year with the Sonoff Dongle-e + ezsp driver, I've tried to change the driver to ember, but nothing is working (tried multiple time) but sometime losing all the devices, sometime they are still there but impossible to interact with them, and pairing is never working. (for now I returned to the ezsp driver). I'm not noticing much error in the log (only the broadcast error reported here https://github.com/Koenkk/zigbee2mqtt/issues/22445)

I've tried the exact same configuration on a regular x86 computer running debian (using the same zigbee dongle) and didn't face any issue which seems to be a linked with the Raspberry pi 4

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

switch from eszp to ember driver

Zigbee2MQTT version

1.37.0

Adapter firmware version

7.4.2.0 build 0

Adapter

Sonoff dongle-e

Setup

Raspberry pi 4 using docker image

Debug log

No response

Nerivec commented 7 months ago

Any chance you can downgrade to 7.4.1 and see if you still have those problems on the pi?

fir3drag0n commented 7 months ago

Same problem with SLZB-06M

But I don't have a raspberry pi 4, host is a x86 machine, running unraid and zigbee2mqtt in docker.

Nerivec commented 7 months ago

Grouping the mentioned broadcasting issue here guys (https://github.com/Koenkk/zigbee2mqtt/issues/22445, https://github.com/Koenkk/zigbee2mqtt/issues/22398) @supaeasy @alainsch @Ricc68 @VladislavVesely @luqsq

I cannot reproduce this with my Dongle-E. I've tried various firmware, various ways to migrate from ezsp to ember (even bad ones 😅). Can you guys think of something that may be different in your setup from a "regular setup"?

raphael1688 commented 7 months ago

Same problem with SLZB-06M

But I don't have a raspberry pi 4, host is a x86 machine, running unraid and zigbee2mqtt in docker.

adapter: ember rtscts: false

May need to add 'rtscts' below adapter setting.

supaeasy commented 7 months ago

Can you guys think of something that may be different in your setup from a "regular setup"?

Two things: I recently installed https://www.zigbee2mqtt.io/devices/ZFP-1A-CH.html#siglis-zfp-1a-ch

Wich I think is not a very common router. Swiss market only and most likely not very popular. Initially I had problems with it. Also shortly after I installed it, my second Dongle-E that I use as a router had to re-pair and this was one of the first devices in my 2yo network that I never had any problems with.

Second: Shortly before my Router Dongle failed I set reporting interval of every lamp to 1-3 seconds because I didn't see lamps status change quickly enough (or at all) when pressing a HW button like the switches mentioned above. After the Dongle failed I reverted this to 1-30 s and had no problems since. But I did the reverting before I saw the error in logs.

Also I have to say: I don't recognize bigger problems or misbehavior. I just saw the error in the logs. The only real problem I have is that sometimes (not reproducible) some IKEA Bulbs are starting in maximum dimmed mode even though at least one of them is never dimmed manually.

julien-billaud commented 7 months ago

Grouping the mentioned broadcasting issue here guys (#22445, #22398) @supaeasy @alainsch @Ricc68 @VladislavVesely @luqsq

I cannot reproduce this with my Dongle-E. I've tried various firmware, various ways to migrate from ezsp to ember (even bad ones 😅). Can you guys think of something that may be different in your setup from a "regular setup"?

As the dongle-e is working using a docker images on an x86 environnement I'm guessing there is no issue with the zigbee Dongle, so if I focus on some specifics configs, here is what's coming to my mind as part of the change that might be different than a regular installation :

everything else is quite standard in my opinion.

alainsch commented 7 months ago

Nothing special over here. Had 1.36 running with SLZD-06M running on zigbee FW 20231030. Everything was running OK with adapter: ezsp

Did the following steps:

So currently I'm in a state that my network is running, but I can't add any new devices.

Is there any more info we can provide?

supaeasy commented 7 months ago

Oh I should have mentioned that I am running HAOS in a VM on Synology DSM 7.2.

Interference should not be a problem as my dongle is in a USB 2 port with a 2 m extension cable.

alainsch commented 7 months ago

My setup is HAOS running on a ODROID M1 with 8GB RAM and 512 GB SSD.

fir3drag0n commented 7 months ago

Nothing special over here. Had 1.36 running with SLZD-06M running on zigbee FW 20231030. Everything was running OK with adapter: ezsp

Did the following steps:

  • upgraded addon to 1.37
  • received the "zh:ezsp: Deprecated driver 'ezsp' currently in use, 'ember' will become the..." messages
  • changed adapter: ezsp to adapter: ember and restarted
  • got an error that my coordinator was not on EZSP13
  • upgraded my coordinator firmware to FW 20240408
  • as adviced by SMLight, changed config in zigbee2mqtt to "adapter: ember" + "rtscts: false"
  • restarted zigbee2mqtt and zigbee network is working
  • now at startup I get the message "zh:ember: Delivery of BROADCAST failed for "65533" [apsFrame={"profileId":0,"clusterId":19,"sourceEndpoint":0,"destinationEndpoint":0,"options":0,"groupId":0,"sequence":212} messageTag=255]"
  • pairing new entities does not work due to the same error
  • switching back to "adapter: ezsp" doesn't work either as I then get the error "zh:controller:greenpower: Received undefined command from '0'". another used already created a ticket for this.

So currently I'm in a state that my network is running, but I can't add any new devices.

Is there any more info we can provide?

Exactly the same behavior. Plus the problem that no new devices can't be paired with ember. But with ezsp I can add devices. In my case especially all my routers get disconnected.

fir3drag0n commented 7 months ago

I do have 4 mmwave presence sensors. Maybe these devices have an influence.

alainsch commented 7 months ago

Sorry, posted my follow-up on the wrong ticket...

These are the messages I see when I startup Zigbee2MQTT. Maybe they are related.

[2024-05-05 11:00:43] info: z2m: Logging to console, file (filename: log.log) [2024-05-05 11:00:49] info: z2m: Starting Zigbee2MQTT version 1.37.0 (commit #unknown) [2024-05-05 11:00:49] info: z2m: Starting zigbee-herdsman (0.45.0) [2024-05-05 11:00:49] info: zh:ember: ======== Ember Adapter Starting ======== [2024-05-05 11:00:49] info: zh:ember:ezsp: ======== EZSP starting ======== [2024-05-05 11:00:49] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-05-05 11:00:49] info: zh:ember:uart:ash: Socket ready [2024-05-05 11:00:49] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-05-05 11:00:51] info: zh:ember:uart:ash: ======== ASH connected ======== [2024-05-05 11:00:51] info: zh:ember:uart:ash: ======== ASH started ======== [2024-05-05 11:00:51] info: zh:ember:ezsp: ======== EZSP started ======== [2024-05-05 11:00:51] warning: zh:ember: [EzspConfigId] Failed to SET "ADDRESS_TABLE_SIZE" TO "16" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 11:00:51] warning: zh:ember: [EzspConfigId] Failed to SET "APS_UNICAST_MESSAGE_COUNT" TO "32" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 11:00:51] warning: zh:ember: [EzspConfigId] Failed to SET "NEIGHBOR_TABLE_SIZE" TO "26" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 11:00:51] warning: zh:ember: [EzspConfigId] Failed to SET "SOURCE_ROUTE_TABLE_SIZE" TO "200" with status=ERROR_INVALID_VALUE. Firmware value will be used instead. [2024-05-05 11:00:51] warning: zh:ember: [EzspConfigId] Failed to SET "MULTICAST_TABLE_SIZE" TO "16" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 11:00:51] info: zh:ember: [STACK STATUS] Network up. [2024-05-05 11:00:51] info: zh:ember: [INIT TC] NCP network matches config. [2024-05-05 11:00:51] info: zh:ember: [CONCENTRATOR] Started source route discovery. 1247ms until next broadcast. [2024-05-05 11:00:51] info: z2m: zigbee-herdsman started (resumed) [2024-05-05 11:00:51] info: z2m: Coordinator firmware version: '{"meta":{"build":0,"ezsp":13,"major":7,"minor":4,"patch":1,"revision":"7.4.1 [GA]","special":0,"type":170},"type":"EmberZNet"}' [2024-05-05 11:00:51] info: z2m: Currently 12 devices are joined: ...

[2024-05-05 11:00:51] info: z2m: Zigbee: disabling joining new devices. [2024-05-05 11:00:51] info: z2m: Connecting to MQTT server at mqtt://core-mosquitto:1883 [2024-05-05 11:00:52] info: z2m: Connected to MQTT server [2024-05-05 11:00:52] info: z2m: Started frontend on port 8099 [2024-05-05 11:00:53] info: z2m: Zigbee2MQTT started! [2024-05-05 11:01:11] error: zh:ember: Delivery of BROADCAST failed for "65532" [apsFrame={"profileId":0,"clusterId":31,"sourceEndpoint":0,"destinationEndpoint":0,"options":0,"groupId":0,"sequence":0} messageTag=255] [2024-05-05 11:01:23] error: zh:ember: Delivery of BROADCAST failed for "65532" [apsFrame={"profileId":0,"clusterId":31,"sourceEndpoint":0,"destinationEndpoint":0,"options":0,"groupId":0,"sequence":0} messageTag=255] [2024-05-05 11:01:33] error: zh:ember: Delivery of BROADCAST failed for "65532" [apsFrame={"profileId":0,"clusterId":31,"sourceEndpoint":0,"destinationEndpoint":0,"options":0,"groupId":0,"sequence":0} messageTag=255]

Whenever I try to start the pairing process, I see these messages:

[2024-05-05 11:03:28] info: z2m: Zigbee: allowing new devices to join. [2024-05-05 11:03:28] info: zh:ember: [STACK STATUS] Network opened. [2024-05-05 11:03:29] error: zh:ember: Delivery of BROADCAST failed for "65532" [apsFrame={"profileId":0,"clusterId":54,"sourceEndpoint":0,"destinationEndpoint":0,"options":256,"groupId":0,"sequence":240} messageTag=2] [2024-05-05 11:03:29] error: zh:ember: Delivery of BROADCAST failed for "65533" [apsFrame={"profileId":41440,"clusterId":33,"sourceEndpoint":242,"destinationEndpoint":242,"options":256,"groupId":0,"sequence":241} messageTag=3]

fir3drag0n commented 7 months ago

@alainsch I also had a discussion with @Nerivec at discord, because I also have the same stillsaying error message.

alainsch commented 7 months ago

Exactly the same behavior. Plus the problem that no new devices can't be paired with ember. But with ezsp I can add devices. In my case especially all my routers get disconnected.

Ah yes, and I wasn't aware it is related...

I have a SLZB-06M as coordinator (groundfloor) and a Sonoff Dongle-E flashed as router (first floor). Yesterday evening my Sonoff router got disconnected. It is while trying to pair it again that I found out I couldn't pair any devices.

I have a very small zigbee network (more a test setup here), so I have no other routers, only end devices.

alainsch commented 7 months ago

@alainsch I also had a discussion with @Nerivec at discord, because I also have the same stillsaying error message.

I'm pretty new to discord, I'll try to find the channel (?) so I can follow the discussion.

fir3drag0n commented 7 months ago

Exactly the same behavior. Plus the problem that no new devices can't be paired with ember. But with ezsp I can add devices. In my case especially all my routers get disconnected.

Ah yes, and I wasn't aware it is related...

I have a SLZB-06M as coordinator (groundfloor) and a Sonoff Dongle-E flashed as router (first floor). Yesterday evening my Sonoff router got disconnected. It is while trying to pair it again that I found out I couldn't pair any devices.

I have a very small zigbee network (more a test setup here), so I have no other routers, only end devices.

I already have nearly 70 devices...

alainsch commented 7 months ago

I already have nearly 70 devices...

Here at home, HA is a small setup (12 devices) I use mainly for testing. But in our vacation home, everything is controlled by HA and we have 51 zigbee and 33 ESPHome devices.

In this second setup, I also have the same SLZB-06M coordinator, but still on the older 20231030 firmware, where the adapter is still defined as 'adapter: ezsp'.

Since I ugraded to 1.37, I couldn't pair any new devices too, due to another error: "zh:controller:greenpower: Received undefined command from '0'"

And that setup is not a test setup :-(

fir3drag0n commented 7 months ago

@alainsch I also had a discussion with @Nerivec at discord, because I also have the same stillsaying error message.

I'm pretty new to discord, I'll try to find the channel (?) so I can follow the discussion.

In the development-branch channel. The similarity we both have is the same coordinator (I am at the dev Firmware right now). But maybe you can rather rule out the cause if you only have 12 devices in your setup.

Ricc68 commented 7 months ago

Very very simple configuration here.

HAOS on qemu VM in low end x86-64 QNAP nas, resources 2 cpu+2 GB ram as suggested by HAOS setup guide. I have seen a lot of ppl using VMs or arm devices: one common point may be low resources in terms of CPU power and/or RAM.

Back to the setup, I can report two setups:

  1. ZBDongle-E with fw 7.4.2, Z2M 1.37.0, ember driver. Only the ZBDongle-E is in the ZigBee network so it is only the coordinator. The broadcast errors happens. This may rule out the devices and spot the light on the coordinator.
  2. ZBDongle-E as above in above setup but with 2 Sonoff TRVZB valves added to the ZigBee network: same error continues to happen. But since it was happening with the coordinator alone as for setup 1, I would rule out the fact that I have added the 2 devices.

Anyway I see from other posts that the error is happening with a variety of devices and if I look at another common factor, all the variety of networks showing the error have -> a coordinator <- which again spots the light on the coordinator.

I see that @Nerivec is not able to reproduce the issue, and, needless to say, also Nerivec is working with a coordinator which should obviously rule out the coordinator itself (unless there is some elusive coordinator hardware common factor), maybe a good starting point for you would be to constrain the system on a low resource/slow host or a VM with limited resources to see what happens with the coordinator handling of Z2M.

Maybe another hint maybe found in the first post from @julien-billaud: "I've tried the exact same configuration on a regular x86 computer running debian (using the same zigbee dongle) and didn't face any issue which seems to be a linked with the Raspberry pi 4".

alainsch commented 7 months ago

OK, because my setup is a small setup mainly for test, I did the following steps:

[12:01:03] INFO: Preparing to start... [12:01:04] INFO: Socat not enabled [12:01:10] INFO: Starting Zigbee2MQTT... [2024-05-05 12:01:14] info: z2m: Logging to console, file (filename: log.log) [2024-05-05 12:01:20] info: z2m: Starting Zigbee2MQTT version 1.37.0 (commit #unknown) [2024-05-05 12:01:20] info: z2m: Starting zigbee-herdsman (0.45.0) [2024-05-05 12:01:20] warning: zh:ezsp: Deprecated driver 'ezsp' currently in use, 'ember' will become the officially supported EmberZNet driver in next release. If using Zigbee2MQTT see https://github.com/Koenkk/zigbee2mqtt/discussions/21462 [2024-05-05 12:01:24] info: zh:ezsp:driv: Leaving current network and forming new network [2024-05-05 12:01:25] info: zh:ezsp:driv: Form network [2024-05-05 12:01:26] info: zh:controller: Wrote coordinator backup to '/config/zigbee2mqtt/level_0/coordinator_backup.json' [2024-05-05 12:01:26] info: z2m: zigbee-herdsman started (reset) [2024-05-05 12:01:26] info: z2m: Coordinator firmware version: '{"meta":{"maintrel":"1 ","majorrel":"7","minorrel":"4","product":13,"revision":"7.4.1.0 build 0"},"type":"EZSP v13"}' [2024-05-05 12:01:26] info: z2m: Currently 0 devices are joined: [2024-05-05 12:01:26] info: z2m: Zigbee: disabling joining new devices. [2024-05-05 12:01:27] info: z2m: Connecting to MQTT server at mqtt://core-mosquitto:1883 [2024-05-05 12:01:27] info: z2m: Connected to MQTT server [2024-05-05 12:01:28] info: z2m: Started frontend on port 8099 [2024-05-05 12:01:28] info: z2m: Zigbee2MQTT started!

[2024-05-05 12:01:40] info: z2m: Zigbee: allowing new devices to join. [2024-05-05 12:01:41] error: zh:controller:greenpower: Received undefined command from '0' [2024-05-05 12:02:00] info: zh:controller: Interview for '0x00158d0008083d2a' started [2024-05-05 12:02:00] info: z2m: Device '0x00158d0008083d2a' joined [2024-05-05 12:02:00] info: z2m: Starting interview of '0x00158d0008083d2a' [2024-05-05 12:02:11] info: zh:controller: Succesfully interviewed '0x00158d0008083d2a' [2024-05-05 12:02:11] info: z2m: Successfully interviewed '0x00158d0008083d2a', device has successfully been paired [2024-05-05 12:02:11] info: z2m: Device '0x00158d0008083d2a' is supported, identified as: Aqara Motion sensor (RTCGQ11LM) [2024-05-05 12:02:11] info: z2m: Configuring '0x00158d0008083d2a' [2024-05-05 12:02:11] info: z2m: Successfully configured '0x00158d0008083d2a'

[2024-05-05 12:02:19] info: z2m: Removing device '0x00158d0008083d2a' (block: false, force: true) [2024-05-05 12:02:19] info: z2m: Successfully removed device '0x00158d0008083d2a' (block: false, force: true)

[12:06:41] INFO: Preparing to start... [12:06:42] INFO: Socat not enabled [12:06:48] INFO: Starting Zigbee2MQTT... [2024-05-05 12:06:53] info: z2m: Logging to console, file (filename: log.log) [2024-05-05 12:06:58] info: z2m: Starting Zigbee2MQTT version 1.37.0 (commit #unknown) [2024-05-05 12:06:58] info: z2m: Starting zigbee-herdsman (0.45.0) [2024-05-05 12:06:59] info: zh:ember: ======== Ember Adapter Starting ======== [2024-05-05 12:06:59] info: zh:ember:ezsp: ======== EZSP starting ======== [2024-05-05 12:06:59] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-05-05 12:06:59] info: zh:ember:uart:ash: Socket ready [2024-05-05 12:06:59] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-05-05 12:07:00] info: zh:ember:uart:ash: ======== ASH connected ======== [2024-05-05 12:07:00] info: zh:ember:uart:ash: ======== ASH started ======== [2024-05-05 12:07:00] info: zh:ember:ezsp: ======== EZSP started ======== [2024-05-05 12:07:00] warning: zh:ember: [EzspConfigId] Failed to SET "ADDRESS_TABLE_SIZE" TO "16" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 12:07:00] warning: zh:ember: [EzspConfigId] Failed to SET "APS_UNICAST_MESSAGE_COUNT" TO "32" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 12:07:00] warning: zh:ember: [EzspConfigId] Failed to SET "NEIGHBOR_TABLE_SIZE" TO "26" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 12:07:00] warning: zh:ember: [EzspConfigId] Failed to SET "SOURCE_ROUTE_TABLE_SIZE" TO "200" with status=ERROR_INVALID_VALUE. Firmware value will be used instead. [2024-05-05 12:07:00] warning: zh:ember: [EzspConfigId] Failed to SET "MULTICAST_TABLE_SIZE" TO "16" with status=ERROR_OUT_OF_MEMORY. Firmware value will be used instead. [2024-05-05 12:07:00] info: zh:ember: [STACK STATUS] Network up. [2024-05-05 12:07:00] info: zh:ember: [INIT TC] NCP network matches config. [2024-05-05 12:07:00] info: zh:ember: [CONCENTRATOR] Started source route discovery. 1248ms until next broadcast. [2024-05-05 12:07:01] info: z2m: zigbee-herdsman started (resumed) [2024-05-05 12:07:01] info: z2m: Coordinator firmware version: '{"meta":{"build":0,"ezsp":13,"major":7,"minor":4,"patch":1,"revision":"7.4.1 [GA]","special":0,"type":170},"type":"EmberZNet"}' [2024-05-05 12:07:01] info: z2m: Currently 0 devices are joined: [2024-05-05 12:07:01] info: z2m: Zigbee: disabling joining new devices. [2024-05-05 12:07:01] info: z2m: Connecting to MQTT server at mqtt://core-mosquitto:1883 [2024-05-05 12:07:01] info: z2m: Connected to MQTT server [2024-05-05 12:07:02] info: z2m: Started frontend on port 8099 [2024-05-05 12:07:02] info: z2m: Zigbee2MQTT started!

[2024-05-05 12:07:40] info: z2m: Zigbee: allowing new devices to join. [2024-05-05 12:07:40] info: zh:ember: [STACK STATUS] Network opened. [2024-05-05 12:08:08] info: zh:controller: Interview for '0x00158d0008083d2a' started [2024-05-05 12:08:08] info: z2m: Device '0x00158d0008083d2a' joined [2024-05-05 12:08:09] info: z2m: Starting interview of '0x00158d0008083d2a' [2024-05-05 12:08:11] warning: zh:ember: [ZDO] Node descriptor for "7769" reports device is only compliant to revision "pre-21" of the ZigBee specification (current revision: 23). [2024-05-05 12:08:47] info: zh:controller: Succesfully interviewed '0x00158d0008083d2a' [2024-05-05 12:08:47] info: z2m: Successfully interviewed '0x00158d0008083d2a', device has successfully been paired [2024-05-05 12:08:47] info: z2m: Device '0x00158d0008083d2a' is supported, identified as: Aqara Motion sensor (RTCGQ11LM) [2024-05-05 12:08:47] info: z2m: Configuring '0x00158d0008083d2a' [2024-05-05 12:08:47] info: z2m: Successfully configured '0x00158d0008083d2a'

so pairing is working and I didn't get the broadcast error now, not while starting up and not while pairing.

So starting over with zigbee2mqtt solved it for me, but that is not possible for everyone I think :-)

alainsch commented 7 months ago

so pairing is working and I didn't get the broadcast error now, not while starting up and not while pairing.

So starting over with zigbee2mqtt solved it for me, but that is not possible for everyone I think :-)

No, not completly... after approx 5 minutes, pairing was again not possible. No errors, but the connection / interview didn't start. Tried to restart z2m and reboot the coordinator, nothing helps.

Downgraded the coordinator to the 20231030 FW (ESZP12) and switched back to "adapter: ezsp" and I still got the "error: zh:controller:greenpower: Received undefined command from '0' " messages, but pairing is possible again.

Will see in about 10 minutes...

fir3drag0n commented 7 months ago

Very very simple configuration here.

HAOS on qemu VM in low end x86-64 QNAP nas, resources 2 cpu+2 GB ram as suggested by HAOS setup guide. I have seen a lot of ppl using VMs or arm devices: one common point may be low resources in terms of CPU power and/or RAM.

Back to the setup, I can report two setups:

  1. ZBDongle-E with fw 7.4.2, Z2M 1.37.0, ember driver. Only the ZBDongle-E is in the ZigBee network so it is only the coordinator. The broadcast errors happens. This may rule out the devices and spot the light on the coordinator.
  2. ZBDongle-E as above in above setup but with 2 Sonoff TRVZB valves added to the ZigBee network: same error continues to happen. But since it was happening with the coordinator alone as for setup 1, I would rule out the fact that I have added the 2 devices.

Anyway I see from other posts that the error is happening with a variety of devices and if I look at another common factor, all the variety of networks showing the error have -> a coordinator <- which again spots the light on the coordinator.

I see that @Nerivec is not able to reproduce the issue, and, needless to say, also Nerivec is working with a coordinator which should obviously rule out the coordinator itself (unless there is some elusive coordinator hardware common factor), maybe a good starting point for you would be to constrain the system on a low resource/slow host or a VM with limited resources to see what happens with the coordinator handling of Z2M.

Maybe another hint maybe found in the first post from @julien-billaud: "I've tried the exact same configuration on a regular x86 computer running debian (using the same zigbee dongle) and didn't face any issue which seems to be a linked with the Raspberry pi 4".

I do also have one Sonoff TRVZB.

And I also started fresh with one new zigbee2mqtt config and just the coordinator, and even at start the pairing/broadcast issue appeared immediately. I don't think that it is an issue with raspberry pi as I am using an x86 machine running a zigbee2mqtt container (docker).

I also observed that a coordinator reset sometimes helped. @Nerivec recommended to do a hard reset with my device (that includes pushing the physical reset button). This also helped me once starting without any issues, but after restarting again, I again suffered by those errors.

Ricc68 commented 7 months ago

HAOS on qemu VM in low end x86-64 QNAP nas, resources 2 cpu+2 GB ram as suggested by HAOS setup guide. I have seen a lot of ppl using VMs or arm devices: one common point may be low resources in terms of CPU power and/or RAM.

I don't think that it is an issue with raspberry pi as I am using an x86 machine running a zigbee2mqtt container (docker).

Just to have a better understanding: what CPU/RAM is your x86 machine? Is it running what OS? Is it on bare metal or on a virtualization environment like Proxmox or other VM of any sort? I agree dockers are less demanding, but performance then is limited by the host so it would be useful to know what kind of host is running your docker and how loaded is your x86 system.

fir3drag0n commented 7 months ago

It is a Intel® Core™ i3-9100 system with 64 GB RAM ECC. It is running Unraid / NAS system with virtualization options (docker or vms).

Nerivec commented 7 months ago

I have a low-resource VM that mimics the specs of an average PI 4 to run tests on stuff that I know affect performance. No issue there either. No failed broadcast without any device, nor with devices, and successfully paired & re-paired a dozen devices since it's been running for a couple of hours.

But just in case, you can try giving it some breathing room with the adapter_delay setting:

advanced:
  adapter_delay: 20

Default/min is 5, max is 60 (milliseconds). Note that at 60, you are likely to experience some delays when triggering devices rapidly.


PS: I created an issue in the firmware repo for the SLZB-06M and the failing config IDs. May or may not be related to the ensuing troubles, but we need to get to the bottom of it nonetheless. https://github.com/darkxst/silabs-firmware-builder/issues/90

Ricc68 commented 7 months ago
adapter_delay: 20

Added the adapter_delay option, no joy:

[2024-05-05 14:42:54] error: zh:ember: Delivery of BROADCAST failed for "65532" [apsFrame={"profileId":0,"clusterId":54,"sourceEndpoint":0,"destinationEndpoint":0,"options":256,"groupId":0,"sequence":170} messageTag=255] [2024-05-05 14:42:55] error: zh:ember: Delivery of BROADCAST failed for "65533" [apsFrame={"profileId":41440,"clusterId":33,"sourceEndpoint":242,"destinationEndpoint":242,"options":256,"groupId":0,"sequence":171} messageTag=1] [2024-05-05 14:42:57] error: zh:ember: Delivery of BROADCAST failed for "65533" [apsFrame={"profileId":0,"clusterId":19,"sourceEndpoint":0,"destinationEndpoint":0,"options":1024,"groupId":0,"sequence":53} messageTag=255] [2024-05-05 14:44:07] error: zh:ember: Delivery of BROADCAST failed for "65533" [apsFrame={"profileId":0,"clusterId":19,"sourceEndpoint":0,"destinationEndpoint":0,"options":1024,"groupId":0,"sequence":59} messageTag=255]

at startup of z2m.

julien-billaud commented 7 months ago

I've been doing little more testing and figured out "what was wrong". I've done the following tests : Start on a fresh install for the pi4 and install the latest version of docker, all from an SD card (removing de SSD plugged on the USB3 port) only remaining plugged, the dongle-e on the second USB3 port. Averything has been running perfectly fine with the ember driver. From that fresh install, I then plugged the SSD on the USB3 port then it started to be way less responsive so I've rebooted the system and got the exact same "BROADCAST" errors and nothing was working. Then, I've switched the dongle-e to one of the USB2.0 port and kept the SSD to one of the USB3 port then no more error. last test, starting the PI4 from the SSD plugged to USB3.0 then the Dongle-e to USB2.0 and now everything is working fine with ember driver.

To conclude, it seems like the ember driver is for some reason little bit more sensitive (I know that using the Dongle without extension cord isn't ideal). Hope it will help for those who are observing the same "BROADCAST" error after switching from ezsp to ember driver and/or what in that driver is leading to that strange behavior.

supaeasy commented 7 months ago

Can't be my problem. USB2 Port with 2m extension cable.

fir3drag0n commented 7 months ago

I also have a remote device which has no influence by USB.

Ricc68 commented 7 months ago

To conclude, it seems like the ember driver is for some reason little bit more sensitive (I know that using the Dongle without extension cord isn't ideal). Hope it will help for those who are observing the same "BROADCAST" error after switching from ezsp to ember driver and/or what in that driver is leading to that strange behavior.

Can't be my problem as well: here USB3 port, no USB2 ports available on NAS, but 1m USB2 extension cable makes it irrelevant.

But look, there might be an interaction or a common factor highlighted by your case: internal USB hub activity (USB3 ports all on the same hub? You know that you can have multiple USB ports but if they are all headed to a single hub the bandwidth is shared and I guess the SSD is draining a lot of it)/disk activity, and again maybe this remands to low resources.

Just to be straight: I'm not believing so much, or only to, the low resources hypothesis, it's only that it is kind of a clear common factor here, but I don't want to bring investigation to a possibly false route.

In addition, I was reasoning about the broadcast error itself. It's a broadcast, it's a message that the coordinator sends over to the ZigBee network. In general terms, and if I understand what a broadcast is in ZigBee terms, the broadcast message is initiated by the driver, sent over the wire to the coordinator firmware and finally the firmware sends it to the radio. It's not something coming in, it's something going out, and it should not necessarily expect an answer (think of a ZigBee network composed only by the coordinator). I tried to set the adapter_delay to 60 milliseconds and the error happens: this makes me think it's not a matter of timing of sending commands to the coordinator firmware but sending this specific broadcast command raises the error. If the firmware is able to send broadcasts over the radio, and this is confirmed by the fact that the ezsp driver don't show the issue, then it must be something with the ember driver or packaging the command or sending the command over the serial wire. Packaging the command should not be the issue because @Nerivec is not able to reproduce the issue, so should we think it's something related to sending the command over the wire?

Another interesting question: are all the broadcast commands failing or only some of them? Answering this may help posing another question: if only some of the broadcast commands are failing, what's the difference between a good broadcast command and a failed broadcast command?

Would sniffing the serial port help understanding something about these errors?

E-DESVIGNE commented 7 months ago

Hello,

I had the same issue (Refer to post #22469). I resolved it by downgrading the dongle to firmware version 7.4.1. Hoping that it can help you..

E. D.

alainsch commented 7 months ago

My system has an ethernet connected coordinator so no USB related problems here.

After a few hours of testing, starting all over again and recreating my zigbee network multiple times, my conclusion is that the combination of the SMLight FW 20240408 (EZSP13) together with Zigbee2MQTT 1.37 and the "ember" adapter is not stable enough for me, even if it is a semi test environment.

Best result I get from this combo is a working zigbee network that after 5 to 15 minutes start reporting broadcast errors and at that time it is unsure if I can pair new devices. It also happens twice that my dongle-E router disconnected.

For the moment I've switched back to the previous SLZB firmware (20231030, EZSP12) and I'm using the 'ezsp' adapter again.

Since I've last restarted my zigbee2MQTT (16h00), I have no problems with pairing or disconnected devices. The only thing I have a few times is the following error:

[2024-05-05 16:16:58] error: zh:controller:greenpower: Received undefined command from '0' [2024-05-05 16:20:18] error: zh:controller:greenpower: Received undefined command from '0' [2024-05-05 16:21:08] error: zh:controller:greenpower: Received undefined command from '0' [2024-05-05 16:21:23] error: zh:controller:greenpower: Received undefined command from '0' [2024-05-05 16:24:43] error: zh:controller:greenpower: Received undefined command from '0' [2024-05-05 16:25:38] error: zh:controller:greenpower: Received undefined command from '0'

The first 4 messages was at that time I added 4 extra devices to test Alarmo, the 2 last was when I changed a "settings (specific)" parameter in the 2 motion sensors I just added.

This must be something related to zigbee2mqtt as I have this in my second network to and there I only upgraded zigbee2mqtt from 1.36 to 1.37

I don't have any spare time the next few days to test anything and I had to make it stable again or my hvac isn't working OK and I would probably have a problem keeping the WAF in balance :-)

Nerivec commented 7 months ago

@Ricc68 Can you run a test with debug-level logs and send the file?

A few things:

FIY, proper startup with `ember` looks like this (the bit related to broadcasts) ```logs zh:controller: Disable joining zh:ember:queue: Status queue=0 priorityQueue=0. zh:ember: ~~~> [ZCL BROADCAST apsFrame={"profileId":41440,"clusterId":33,"sourceEndpoint":242,"destinationEndpoint":242,"options":4416,"groupId":65533,"sequence":0} header={"frameControl":{"reservedBits":0,"frameType":1,"direction":1,"disableDefaultResponse":true,"manufacturerSpecific":false},"manufacturerCode":null,"transactionSequenceNumber":2,"commandIdentifier":2}] zh:ember:ezsp: ===> [FRAME: ID=54:"SEND_BROADCAST" Seq=57 Len=27] zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=1 frmRx=4] zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2 frmNum=4] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=5] zh:ember:ezsp: <=== [FRAME: ID=54:"SEND_BROADCAST" Seq=57 Len=7] zh:ember:ezsp: ~~~> [SENT type=BROADCAST apsSequence=209 messageTag=1 status=SUCCESS] zh:ember:queue: Status queue=0 priorityQueue=0. zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2 frmNum=5] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=6] zh:ember:ezsp: <=== [FRAME: ID=69:"INCOMING_MESSAGE_HANDLER" Seq=57 Len=30] zh:ember:ezsp: ezspIncomingMessageHandler(): callback called with: [type=BROADCAST_LOOPBACK], [apsFrame={"profileId":41440,"clusterId":33,"sourceEndpoint":242,"destinationEndpoint":242,"options":256,"groupId":0,"sequence":209}], [lastHopLqi=255], [lastHopRssi=0], [sender=0], [bindingIndex=255], [addressIndex=255], [messageContents=1902020a0000] zh:ember:ezsp: ===> [FRAME: ID=107:"CLEAR_TRANSIENT_LINK_KEYS" Seq=58 Len=5] zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=2 frmRx=6] zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=3] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=3 frmNum=6] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7] zh:ember:ezsp: <=== [FRAME: ID=107:"CLEAR_TRANSIENT_LINK_KEYS" Seq=58 Len=5] zh:ember:ezsp: ===> [FRAME: ID=85:"SET_POLICY" Seq=59 Len=7] zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=3 frmRx=7] zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=4] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=4 frmNum=7] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=0] zh:ember:ezsp: <=== [FRAME: ID=85:"SET_POLICY" Seq=59 Len=6] zh:ember: [EzspPolicyId] SET "TRUST_CENTER_POLICY" TO "2" with status=SUCCESS. zh:ember:ezsp: ===> [FRAME: ID=34:"PERMIT_JOINING" Seq=60 Len=6] zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=4 frmRx=0] zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=5] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=5 frmNum=0] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=1] zh:ember:ezsp: <=== [FRAME: ID=34:"PERMIT_JOINING" Seq=60 Len=6] zh:ember: Permit joining for 0 sec. status=0 zh:ember: ~~~> [ZDO PERMIT_JOINING_REQUEST target=65532 duration=0 authentication=1] zh:ember: ~~~> [ZDO BROADCAST apsFrame={"profileId":0,"clusterId":54,"sourceEndpoint":0,"destinationEndpoint":0,"options":4416,"groupId":0,"sequence":0} messageTag=1] zh:ember:ezsp: ===> [FRAME: ID=54:"SEND_BROADCAST" Seq=61 Len=24] zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=5 frmRx=1] zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6 frmNum=1] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=2] zh:ember:ezsp: <=== [FRAME: ID=54:"SEND_BROADCAST" Seq=61 Len=7] zh:ember: ~~~> [SENT ZDO type=BROADCAST apsFrame={"profileId":0,"clusterId":54,"sourceEndpoint":0,"destinationEndpoint":0,"options":4416,"groupId":0,"sequence":210} messageTag=1 status=SUCCESS] zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6 frmNum=2] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=3] zh:ember:ezsp: <=== [FRAME: ID=63:"MESSAGE_SENT_HANDLER" Seq=61 Len=28] zh:ember:ezsp: ezspMessageSentHandler(): callback called with: [type=BROADCAST], [indexOrDestination=65533], [apsFrame={"profileId":41440,"clusterId":33,"sourceEndpoint":242,"destinationEndpoint":242,"options":256,"groupId":0,"sequence":209}], [messageTag=255], [status=SUCCESS], [messageContents=1902020a0000] zh:ember:uart:ash: <--- [FRAME type=DATA] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6] zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6 frmNum=3] Added to rxQueue zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=4] zh:ember:ezsp: <=== [FRAME: ID=63:"MESSAGE_SENT_HANDLER" Seq=61 Len=25] zh:ember:ezsp: ezspMessageSentHandler(): callback called with: [type=BROADCAST], [indexOrDestination=65532], [apsFrame={"profileId":0,"clusterId":54,"sourceEndpoint":0,"destinationEndpoint":0,"options":256,"groupId":0,"sequence":210}], [messageTag=1], [status=SUCCESS], [messageContents=010001] ```

If someone with this trouble can spare some time, and is able to run custom compiled code (by replacing the existing compiled code in your zigbee2mqtt installation in node_modules/zigbee-herdsman/dist) to test a few things in more details, find me on zigbee2mqtt Discord (same username as here).

fir3drag0n commented 7 months ago

@Ricc68 Can you run a test with debug-level logs and send the file?

A few things:

  • If you have permit_join: true in your config, set it to false. This is known to cause troubles sometimes (and broadcasts on startup are from permit join enabling/disabling -depending on setting-).
  • Delivery failed means the command was successfully sent and processed by the adapter, but the adapter failed to deliver it so it sends back this status. Since the fail seems to be happening with or without devices on the network, this is a bit strange...

FIY, proper startup with ember looks like this (the bit related to broadcasts) If someone with this trouble can spare some time, and is able to run custom compiled code (by replacing the existing compiled code in your zigbee2mqtt installation in node_modules/zigbee-herdsman/dist) to test a few things in more details, find me on zigbee2mqtt Discord (same username as here).

So this might be a coordinator firmware issue, especially with "delivery failed"?

joinwind commented 7 months ago

Hello, i've 65532 and 65533 broadcast probel, i've tried with all firmware and reinstalled z2m (even edge) but do'nt work attached the debug log . log.log

Ricc68 commented 7 months ago

@Nerivec I am attaching the startup log and 15 minutes runnig: it should have captured some broadcasts. log.log

Nerivec commented 7 months ago

Thanks to @joinwind for the help in testing this. We ran all sorts of tests to make ember behave like ezsp in configuration and in the sending of broadcasts, reset the adapter to factory defaults, brand new networks, etc... Nothing made any difference. In the end, it seems to be somewhat unrelated to the code itself, but the hardware the adapter is plugged into. It would seem a side-effect of the "completeness" (protocols are implemented in full, so a lot more is going on behind the scene) of the ember driver could be that it makes adapters draw a bit more power... (since that's the only explanation I can come up with... all in all, it somewhat makes more sense that power troubles are making the adapter unstable, rather than it failing to broadcast even on empty network...) So for everyone still having this issue, please check your USB hubs, USB ports, POE adapters (or whatever mean you use for network adapters), etc... make sure enough juice is given to them. Since I can't seem to trigger this anywhere here, I can't give any more details on the specs required, just "more" than what you currently have, and see if that works better.

Please report any findings.

Ricc68 commented 7 months ago

@Nerivec It's strange because from ithead ZBDongle-E specs it says:

Additional information DC 5V (100mA Max) Zigbee 3.0 Alumium alloy 75×25.5×13.5mm

So if we believe ithead specs, the dongle draws max 100 mA on the 5V. 100 mA should be easy enough for any USB port. I have it plugged in an USB3 (plus USB2 extension cable) so 100 mA should not be an issue at all. This is the only USB device plugged in my NAS which have 5 USB3 ports (4 of them are free). Moreover I would exclude general issues with my NAS because it has 4 mechanical WD red plus an SSD WD red and it is running perfectly stable. I guess that if that USB3 cannot give 100 mA then I would have any sort of much worse problems with the rest of the system.

I can try to change the port to see if that port is faulty but I seriously doubt.

Nerivec commented 7 months ago

We're definitely in strange territories...

fir3drag0n commented 7 months ago

I am using also a separate USB hub with own power supply with my lan/USB zigbee device SLZB-06M, so this seems odd.

joinwind commented 7 months ago

@Ricc68 try to remove extension cable, yesterday the sonoff working as expected (ezsp ok, flash firmware, reset ecc) except for broadcast, when i put directly on a usb3 all working well

supaeasy commented 7 months ago

Isn't an extension cable, especially on a USB3 Port the one dogma that should always be taken into account?

fir3drag0n commented 7 months ago

And I do not think that this is the cause for lan driven devices which are not directly connected to raspberry pis or other machines, they just need power over poe or USB.

itwtds commented 6 months ago

Same story with the Skyconnect, so it looks as though this is not necessarily related to specific coordinator models.

Nerivec commented 6 months ago

Some updates on this for those still having the issue. darkxst ran some tests too:

This one is proving very annoying to track down. Can I get an update from those that still have the issue (all broadcasts fail) after going through the usual suspects? With debug logs attached please, so I can filter out issues that look the same but may not be.

alainsch commented 6 months ago

That might be a good point... I switched for ODroid M1 8GB (not M1s) to a VM on my Synology NAS and I've got no more broadcast issues. Running latest zigbee fw from darkxst on my SLZB-06M

I might have some time the next days to try to switch back to the M1 to see if they come back.

fir3drag0n commented 6 months ago

What solution is there now to mitigate the issue?

Nerivec commented 6 months ago

@fir3drag0n None that I know of at the moment (haven't been able to replicate this even once on my end...), if none of the clues here can help (except ezsp driver if that works in your scenario for now...). Did you update the slzb06m to the latest firmware (core+emberznet)? They released a bunch of fixes in last few days.

The only theory we can come up with for now is that because ember implements much more of the protocol, we could be hitting an edge-case bug in the firmware that has remained undetected up until now because the code path was never triggered before (or too rarely to get noticed). That, of course, only covers the case where ezsp works, and nothing else fixes it (like getting rid of possible interferences, etc), since those other cases are likely unrelated, despite the same symptoms.

alainsch commented 6 months ago

I might have some time the next days to try to switch back to the M1 to see if they come back.

Switched back to ODroid M1 and have no broadcast errors or any other problem.

I think for me the latest zigbee firmware from darkxst solved my problem.

fir3drag0n commented 6 months ago

@fir3drag0n None that I know of at the moment (haven't been able to replicate this even once on my end...), if none of the clues here can help (except ezsp driver if that works in your scenario for now...). Did you update the slzb06m to the latest firmware (core+emberznet)? They released a bunch of fixes in last few days.

The only theory we can come up with for now is that because ember implements much more of the protocol, we could be hitting an edge-case bug in the firmware that has remained undetected up until now because the code path was never triggered before (or too rarely to get noticed). That, of course, only covers the case where ezsp works, and nothing else fixes it (like getting rid of possible interferences, etc), since those other cases are likely unrelated, despite the same symptoms.

Yes, I updated core and zigbee radio, but unfortunately no change for me, still the same issue. For now, I stick to ezsp until someone has a working solution.