Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.75k stars 1.64k forks source link

Migration to Ember 7.4.2 on Sonoff Dongle-E - Devices No Longer Reachable and Container Restarting #22984

Open robdoug89 opened 3 months ago

robdoug89 commented 3 months ago

What happened?

After updating the FW for my Sonoff Dongle-E to 7.4.2 and the zigbee2mqtt container to 1.38.0 I'm no longer able to control any devices and the container will restart after a few minutes

I was previously using the ezsp driver and can still see all of my previously paired devices

The container exits with: "zigbee2mqtt exited with code 137"

I've tried flashing FW values 7.4.0-7.4.3 and have had no luck. I also attempted to restore from a backup and use the ezsp driver again, but that just times out when I try to interact with any devices

What did you expect to happen?

I expected to be able to continue to use my network after upgrading to use the ember driver

How to reproduce it (minimal and precise)

Restart container (docker-compose up) and after about two minutes it will exit and restart itself

The container will go from a health state of "starting" to "unhealthy" before the restart

Zigbee2MQTT version

1.38.0

Adapter firmware version

7.4.2

Adapter

Sonoff Dongle-E

Setup

Running in a docker container on an x86-64 machine

Debug log

Attaching to zigbee2mqtt zigbee2mqtt | Using '/app/data' as data directory zigbee2mqtt | [2024-06-08 15:12:13] warning: z2m: permit_join set to true in configuration.yaml. zigbee2mqtt | [2024-06-08 15:12:13] warning: z2m: Allowing new devices to join. zigbee2mqtt | [2024-06-08 15:12:13] warning: z2m: Set permit_join to false once you joined all devices. zigbee2mqtt | [2024-06-08 15:12:13] error: z2m: Cannot find '0x0017880103bc08ed/11' of group 'office_lights' zigbee2mqtt | [2024-06-08 15:12:13] error: z2m: Cannot find endpoint '11' of device '0x0017880103bc08ed' zigbee2mqtt | [2024-06-08 15:12:13] error: z2m: defined ssl_cert '/home/robdoug/docker/zigbee2mqtt/zigbee2mqtt-data/fullchain.pem' or ssl_key '/home/robdoug/docker/zigbee2mqtt/zigbee2mqtt-data/privkey.pem' file path does not exists, server won't be secured. zigbee2mqtt | [2024-06-08 15:12:24] info: z2m: Successfully changed options zigbee2mqtt | [2024-06-08 15:12:24] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/response/options', payload '{"data":{"restart_required":false},"status":"ok","transaction":"nl35r-3"}' zigbee2mqtt | [2024-06-08 15:12:29] debug: z2m:mqtt: Received MQTT message on 'zigbee2mqtt/Office Closet Light/set' with data '{"state":"OFF"}' zigbee2mqtt | [2024-06-08 15:12:29] debug: z2m: Publishing 'set' 'state' to 'Office Closet Light' zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:controller:endpoint: ZCL command 0x001788010ce82d29/11 genOnOff.off({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:queue: Status queue=0 priorityQueue=0. zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember: ~~~> [ZCL to=54739 apsFrame={"profileId":260,"clusterId":6,"sourceEndpoint":1,"destinationEndpoint":11,"options":4416,"groupId":0,"sequence":0} header={"frameControl":{"reservedBits":0,"frameType":1,"direction":0,"disableDefaultResponse":false,"manufacturerSpecific":false},"manufacturerCode":null,"transactionSequenceNumber":3,"commandIdentifier":0}] zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:ezsp: ===> [FRAME: ID=52:"SEND_UNICAST" Seq=52 Len=24] zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=4 frmRx=1] zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:uart:ash: <--- [FRAME type=DATA] zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=5] zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=5 frmNum=1] Added to rxQueue zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=2] zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:ezsp: <=== [FRAME: ID=52:"SEND_UNICAST" Seq=52 Len=7] zigbee2mqtt | [2024-06-08 15:12:29] debug: zh:ember:ezsp: ~~~> [SENT type=DIRECT apsSequence=226 messageTag=2 status=SUCCESS] zigbee2mqtt | [2024-06-08 15:12:33] debug: zh:ember:uart:ash: <--- [FRAME type=DATA] zigbee2mqtt | [2024-06-08 15:12:33] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=5] zigbee2mqtt | [2024-06-08 15:12:33] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=5 frmNum=2] Added to rxQueue zigbee2mqtt | [2024-06-08 15:12:33] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=3] zigbee2mqtt | [2024-06-08 15:12:33] debug: zh:ember:ezsp: <=== [FRAME: ID=63:"MESSAGE_SENT_HANDLER" Seq=52 Len=25] zigbee2mqtt | [2024-06-08 15:12:33] debug: zh:ember:ezsp: ezspMessageSentHandler(): callback called with: [type=DIRECT], [indexOrDestination=54739], [apsFrame={"profileId":260,"clusterId":6,"sourceEndpoint":1,"destinationEndpoint":11,"options":4160,"groupId":0,"sequence":226}], [messageTag=2], [status=DELIVERY_FAILED], [messageContents=010300] zigbee2mqtt | [2024-06-08 15:12:33] debug: zh:controller:endpoint: ZCL command 0x001788010ce82d29/11 genOnOff.off({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Delivery failed for {"profileId":260,"clusterId":6,"sourceEndpoint":1,"destinationEndpoint":11,"options":4160,"groupId":0,"sequence":226}) zigbee2mqtt | [2024-06-08 15:12:33] error: z2m: Publish 'set' 'state' to 'Office Closet Light' failed: 'Error: ZCL command 0x001788010ce82d29/11 genOnOff.off({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Delivery failed for {"profileId":260,"clusterId":6,"sourceEndpoint":1,"destinationEndpoint":11,"options":4160,"groupId":0,"sequence":226})' zigbee2mqtt | [2024-06-08 15:12:33] debug: z2m: Error: ZCL command 0x001788010ce82d29/11 genOnOff.off({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Delivery failed for {"profileId":260,"clusterId":6,"sourceEndpoint":1,"destinationEndpoint":11,"options":4160,"groupId":0,"sequence":226}) zigbee2mqtt | at EmberOneWaitress.deliveryFailedFor (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/oneWaitress.ts:96:31) zigbee2mqtt | at EmberAdapter.onMessageSentDeliveryFailed (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:610:30) zigbee2mqtt | at Ezsp.emit (node:events:517:28) zigbee2mqtt | at Ezsp.ezspMessageSentHandler (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:3960:18) zigbee2mqtt | at Ezsp.callbackDispatch (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:797:18) zigbee2mqtt | at Ezsp.tick (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:451:22) zigbee2mqtt | at listOnTimeout (node:internal/timers:569:17) zigbee2mqtt | at processTimers (node:internal/timers:512:7) zigbee2mqtt | [2024-06-08 15:13:07] debug: z2m:mqtt: Received MQTT message on 'zigbee2mqtt/stairway_lights/set' with data '{"state":"OFF"}' zigbee2mqtt | [2024-06-08 15:13:07] debug: z2m: Publishing 'set' 'state' to 'stairway_lights' zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:controller:group: Command 2 genOnOff.off({}) zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:queue: Status queue=0 priorityQueue=0. zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember: ~~~> [ZCL GROUP apsFrame={"profileId":260,"clusterId":6,"sourceEndpoint":1,"destinationEndpoint":1,"options":4416,"groupId":2,"sequence":0} header={"frameControl":{"reservedBits":0,"frameType":1,"direction":0,"disableDefaultResponse":true,"manufacturerSpecific":false},"manufacturerCode":null,"transactionSequenceNumber":4,"commandIdentifier":0}] zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:ezsp: ===> [FRAME: ID=56:"SEND_MULTICAST" Seq=53 Len=23] zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=5 frmRx=3] zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:uart:ash: <--- [FRAME type=DATA] zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6] zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6 frmNum=3] Added to rxQueue zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=4] zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:ezsp: <=== [FRAME: ID=56:"SEND_MULTICAST" Seq=53 Len=7] zigbee2mqtt | [2024-06-08 15:13:07] debug: zh:ember:ezsp: ~~~> [SENT type=MULTICAST apsSequence=227 messageTag=3 status=SUCCESS] zigbee2mqtt | [2024-06-08 15:13:07] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/Stairway Light 1', payload '{"brightness":1,"color":{"x":0.5267,"y":0.4133},"color_mode":"color_temp","color_temp":500,"linkquality":180,"power_on_behavior":"on","state":"OFF","update":{"installed_version":537029168,"latest_version":537029168,"state":"idle"},"update_available":null}' zigbee2mqtt | [2024-06-08 15:13:07] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/stairway_lights', payload '{"brightness":1,"color":{"x":0.5267,"y":0.4133},"color_mode":"color_temp","color_temp":500,"state":"OFF"}' zigbee2mqtt | [2024-06-08 15:13:07] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/Stairway Light 2', payload '{"brightness":1,"color":{"x":0.5267,"y":0.4133},"color_mode":"color_temp","color_temp":500,"level_config":{"on_level":"previous"},"linkquality":188,"power_on_behavior":"on","state":"OFF","update":{"installed_version":537029168,"latest_version":537029168,"state":"idle"},"update_available":null}' zigbee2mqtt | [2024-06-08 15:13:07] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/Stairway Light 3', payload '{"brightness":1,"color":{"x":0.5267,"y":0.4133},"color_mode":"color_temp","color_temp":500,"linkquality":176,"power_on_behavior":"on","state":"OFF","update":{"installed_version":537029168,"latest_version":537029168,"state":"idle"},"update_available":null}' zigbee2mqtt | [2024-06-08 15:13:08] debug: zh:ember:uart:ash: <--- [FRAME type=DATA] zigbee2mqtt | [2024-06-08 15:13:08] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6] zigbee2mqtt | [2024-06-08 15:13:08] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=6 frmNum=4] Added to rxQueue zigbee2mqtt | [2024-06-08 15:13:08] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=5] zigbee2mqtt | [2024-06-08 15:13:08] debug: zh:ember:ezsp: <=== [FRAME: ID=63:"MESSAGE_SENT_HANDLER" Seq=53 Len=25] zigbee2mqtt | [2024-06-08 15:13:08] debug: zh:ember:ezsp: ezspMessageSentHandler(): callback called with: [type=MULTICAST], [indexOrDestination=65533], [apsFrame={"profileId":260,"clusterId":6,"sourceEndpoint":1,"destinationEndpoint":255,"options":256,"groupId":2,"sequence":227}], [messageTag=3], [status=SUCCESS], [messageContents=110400] zigbee2mqtt exited with code 137

Nerivec commented 3 months ago

Docker 137 is when the container runs out of memory. Did you adjust anything related to that?

robdoug89 commented 3 months ago

I haven't adjusted anything related to the containers set up since I've done the upgrades

I also tested running the container in swarm mode to use the resources tag to up the memory limit to 4G (the container never consumed more than 150 MB, though) and it still crashed with the same 137 error code

What's interesting is that the state of the container is "starting" for about 50-60 seconds then it transitions immediately to "unhealthy" before it restarts itself

robdoug89 commented 3 months ago

I'll attempt to go back to a pre-7.* FW version on the dongle and back to 1.36 for z2m to see if that gets around the issues for now (while using the ezsp driver)

robdoug89 commented 3 months ago

Went back to 6.10.3 using the ezsp driver and same issue

The container will run for ~1 minute before becoming unhealthy and restart itself. Other than that error 137 on exit, there are no other hints I could glean from the debug logs

I also tried completely rolling back my zigbee2mqtt-data/ folder with the old configuration, coordinator_backup, and database but the issue persists

Nerivec commented 3 months ago

Was anything else updated on your host around the same time as the Z2M update?

@Koenkk any idea on this?

Koenkk commented 3 months ago

@robdoug89 could you provide the debug log from starting z2m until the crash?

See this on how to enable debug logging.

robdoug89 commented 3 months ago

Was anything else updated on your host around the same time as the Z2M update?

Nothing else changed in my set up other than moving to 1.38.0 and the 7.* FW on my dongle

robdoug89 commented 3 months ago

debug_log.txt

I enabled debug logging across the board and attempted to restart the container

Thank you for all the help so far @Nerivec and @Koenkk!

Nerivec commented 3 months ago

Please provide the same type of log but with ember (with 7.4.3). To eliminate other unrelated things that can go wrong with ezsp / old firmware.

robdoug89 commented 3 months ago

Sure thing, changed back to using:

Driver: Ember Z2M: 1.38.0 FW: 7.4.3

debug_log_7_4_3.txt

Nerivec commented 3 months ago

Try setting permit_join to false in configuration.yaml though, it's been known to sometimes cause weird issues when true. Not sure why the container is restarting, I don't see a reason for it in the logs. The stack appears to be starting fine. Then everything just stops abruptly, and z2m isn't throwing any error that would cause that (or it's not being logged).


I do see these errors on start though:

[2024-06-10 18:58:46] error:    z2m: Cannot find '0x0017880103bc08ed/11' of group 'office_lights'
[2024-06-10 18:58:46] error:    z2m: Cannot find endpoint '11' of device '0x0017880103bc08ed'
[2024-06-10 18:58:46] error:    z2m: defined ssl_cert '/home/robdoug/docker/zigbee2mqtt/zigbee2mqtt-data/fullchain.pem' or ssl_key '/home/robdoug/docker/zigbee2mqtt/zigbee2mqtt-data/privkey.pem' file path does not exists, server won't be secured.
robdoug89 commented 3 months ago

I set permit_join to false and fixed all three of the errors mentioned in the log file

Unfortunately, same outcome with delivery failing for any ZCL command

debug_log_resolve_errors.txt

Any other thoughts on ways to debug this? Would prefer to avoid having to re-pair all devices if possible :)

Nerivec commented 2 months ago

Are you familiar with the procedure to clear the NVM3? You can backup the adapter first, then restore it after. It should prevent you from having to re-pair all devices. See https://github.com/darkxst/silabs-firmware-builder/issues/84#issuecomment-2094078288 for the procedure. You can use bellows or https://github.com/Nerivec/ember-zli (still a work in progress, but I've tested all this on Dongle-E so it should work fine for you). If you use ember-zli, use these commands and follow these menus (using arrow keys):

  1. ember-zli stack => Backup tokens (NVM3)
  2. ember-zli bootloader -r => Clear NVM3
  3. ember-zli bootloader -r => Update firmware => Use pre-defined firmware => Latest (7.4.3)
  4. ember-zli stack => Restore tokens (NVM3)

ember-zli is interactive, it will ask questions when you are required to give it some information (like serial path, file paths, etc). Make sure to pick Sonoff ZBDongle-E as adapter when asked.

robdoug89 commented 2 months ago

Unfortunately even after clearing the NVM3, I was getting the same errors

I decided to clear the NVM3 again and remove my database/backup files and just use my configuration file going forward with a clean install

I was able to pair 7 of my devices before the process stopped working. Now I'm no longer able to get any interview to show on screen and there are no logs (debug or otherwise) that are hinting at why

I've shut my container down for 30 min, re-plugged my adapter, and waited but the behavior of not being able to pair persists

The good news is that after I the devices are re-paired, all the naming and the like flows the same through Home Assistant and Node Red so that makes getting this operational again a lot easier

Any thoughts on why the pairing would suddenly stop after a few devices?

Nerivec commented 2 months ago

When you say no logs, do you mean no log at all (frozen), or just no log that helped on the matter? Any chance you can try bare Z2M (not containerized), even temporarily, to eliminate container issues? You can also try to remove the devices you added, starting from last, and see if one is a troublemaker (some devices are so bad, they're the equivalent of a black hole in your network 😅).

robdoug89 commented 2 months ago

Ah, Just no logs that were helpful. The devices that did pair are actually working great!

I will try non containerized next and work backwards on the devices I have added to find any troublemakers

Very much appreciate the continued suggestions!

On Sat, Jun 15, 2024 at 11:13 AM Nerivec @.***> wrote:

When you say no logs, do you mean no log at all (frozen), or just no log that helped on the matter? Any chance you can try bare Z2M (not containerized), even temporarily, to eliminate container issues? You can also try to remove the devices you added, starting from last, and see if one is a troublemaker (some devices are so bad, they're the equivalent of a black hole in your network 😅).

— Reply to this email directly, view it on GitHub https://github.com/Koenkk/zigbee2mqtt/issues/22984#issuecomment-2170464264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYXMDO66ZDJCDXJSOUM3KYTZHR73NAVCNFSM6AAAAABJAKEFEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZQGQ3DIMRWGQ . You are receiving this because you were mentioned.Message ID: @.***>

robdoug89 commented 2 months ago

Unfortunately I haven't been able to make any progress

I tried the non-containerized method and even after resetting the firmware again to try and start completely from scratch I still see nothing showing up when I try and join new devices

I deleted all the previously paired devices as well but no change

Additionally, I did give ZHA a try to see if that would work but devices wouldn't pair there either (which makes me suspect that it might be something I've done with my dongle)