Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.84k stars 1.65k forks source link

Aqara SSM-U02 crashes Zigbee2MQTT #24058

Open Skeletorjus opened 6 days ago

Skeletorjus commented 6 days ago

What happened?

I have three [Aqara SSM-U02](Aqara SSM-U02) in my network. One of them is slow to respond and will often take down the zigbee network when used. This happens both when toggling the switch from Home Assistant/Zigbee2MQTT or when using the physical switch. I often have to repair the switch after a such occurence

Attached is a debug-log capturing the startup of Zigbee2MQTT followed by two toggles of the switch which leads to the crash. The switch in question is "Room1 Lysbryter"

I think this is the most interesting part, but the whole log is attached.

[2024-09-22 13:44:28] debug:    z2m: Publishing 'set' 'state' to 'Room1 Lysbryter'
[2024-09-22 13:44:28] debug:    zh:controller:endpoint: ZCL command 0x54ef441000666c34/1 genOnOff.on({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"reservedBits":0,"writeUndiv":false})
[2024-09-22 13:44:28] debug:    zh:zstack: sendZclFrameToEndpointInternal 0x54ef441000666c34:30578/1 (0,0,2)
[2024-09-22 13:44:28] debug:    zh:zstack:znp: SREQ: --> AF - dataRequest - {"dstaddr":30578,"destendpoint":1,"srcendpoint":1,"clusterid":6,"transid":19,"options":0,"radius":30,"len":3,"data":{"type":"Buffer","data":[1,10,1]}}
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:writer: --> frame [254,13,36,1,114,119,1,1,6,0,19,0,30,3,1,10,1,47]
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: <-- [254,1,100,1,0,100]
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: --- parseNext [254,1,100,1,0,100]
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: --> parsed 1 - 3 - 4 - 1 - [0] - 100
[2024-09-22 13:44:28] debug:    zh:zstack:znp: SRSP: <-- AF - dataRequest - {"status":0}
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: --- parseNext []
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: <-- [254,3,68,128,0,1,19,213]
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: --- parseNext [254,3,68,128,0,1,19,213]
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: --> parsed 3 - 2 - 4 - 128 - [0,1,19] - 213
[2024-09-22 13:44:28] debug:    zh:zstack:znp: AREQ: <-- AF - dataConfirm - {"status":0,"endpoint":1,"transid":19}
[2024-09-22 13:44:28] debug:    zh:zstack:unpi:parser: --- parseNext []
[2024-09-22 13:44:29] debug:    zh:zstack:unpi:parser: <-- [254,3,68,128,0,1,18,212]
[2024-09-22 13:44:29] debug:    zh:zstack:unpi:parser: --- parseNext [254,3,68,128,0,1,18,212]
[2024-09-22 13:44:29] debug:    zh:zstack:unpi:parser: --> parsed 3 - 2 - 4 - 128 - [0,1,18] - 212
[2024-09-22 13:44:29] debug:    zh:zstack:znp: AREQ: <-- AF - dataConfirm - {"status":0,"endpoint":1,"transid":18}
[2024-09-22 13:44:29] debug:    zh:zstack:unpi:parser: --- parseNext []
[2024-09-22 13:44:38] debug:    zh:zstack: Response timeout (0x54ef441000666c34:30578,0)
[2024-09-22 13:44:38] debug:    zh:zstack:znp: SREQ: --> UTIL - assocGetWithAddress - {"extaddr":"0x54ef441000666c34","nwkaddr":30578}
[2024-09-22 13:44:38] debug:    zh:zstack:unpi:writer: --> frame [254,10,39,74,52,108,102,0,16,68,239,84,114,119,179]
[2024-09-22 13:44:44] debug:    zh:controller:endpoint: Error: ZCL command 0x54ef441000666c34/1 genOnOff.on({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"reservedBits":0,"writeUndiv":false}) failed (SRSP - UTIL - assocGetWithAddress after 6000ms)
    at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
    at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:300:45
    at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
    at Znp.request (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:291:27)
    at Znp.requestWithReply (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:269:34)
    at ZStackAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:606:50)
    at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:20)
    at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:82:20)
    at Endpoint.zclCommand (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:911:28)
    at Endpoint.command (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:683:23)
[2024-09-22 13:44:44] error:    z2m: Publish 'set' 'state' to 'Room1 Lysbryter' failed: 'Error: ZCL command 0x54ef441000666c34/1 genOnOff.on({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"reservedBits":0,"writeUndiv":false}) failed (SRSP - UTIL - assocGetWithAddress after 6000ms)'
[2024-09-22 13:44:44] debug:    z2m: Error: ZCL command 0x54ef441000666c34/1 genOnOff.on({}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"reservedBits":0,"writeUndiv":false}) failed (SRSP - UTIL - assocGetWithAddress after 6000ms)
    at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
    at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:300:45
    at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
    at Znp.request (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:291:27)
    at Znp.requestWithReply (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:269:34)
    at ZStackAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:606:50)
    at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:20)
    at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:82:20)
    at Endpoint.zclCommand (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:911:28)
    at Endpoint.command (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:683:23)

Please note that this could very well be the underlying reason leading to my contributions in https://github.com/Koenkk/Z-Stack-firmware/issues/518 and https://github.com/Koenkk/zigbee2mqtt/issues/23869.

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.40.1 commit: 403d3c0

Adapter firmware version

20240710

Adapter

SONOFF Zigbee 3.0 USB Dongle-P Plus (zStack3x0)

Setup

Z2M in docker on Debian 12

Debug log

log_sanitized.log

Skeletorjus commented 5 days ago

Updated to fw 99240914 from https://github.com/Koenkk/zigbee2mqtt/issues/23571#issuecomment-2351106980. Hopefully this firmware takes care of the full buffer error so Zigbee2MQTT is able to restart by itself after the crash, but the original issue with the switch unfortunately persists.

Koenkk commented 4 days ago

Does it still crash with 99240914? If not, try this also: https://github.com/Koenkk/zigbee2mqtt/issues/23869#issuecomment-2366942980

Skeletorjus commented 4 days ago

Unfortunately still crashes with 99240914. I had to remove the device from the network for the time being since one press on the switch was enough to bring everything down. Happy to add it again if needed.

Koenkk commented 3 days ago

Does it also crash with https://github.com/Koenkk/Z-Stack-firmware/releases/tag/Z-Stack_3.x.0_coordinator_20221226 ?

Skeletorjus commented 2 days ago

OK, before flashing 20221226, I wanted to try pairing the switch with 99240914 once more to see if I could get a better understanding of what was happening. After repairing, it didn't take many uses of the switch to take the network down again. It is quite hard to get the network up and running again after sunch an occurence, but after unpairing the device and some restarts of the container, it came back up. The strange thing is that after physically toggling the switch it actually started pairing itself to the network again - while permit join was off! Back into the network the switch went, and a couple of interactions with the switch the network went down again. (Edit: I found in another issue that the rejoining is due to the device not being factory reset, so this aspect is probably not a mystery after all)

During this I made another discovery; the power outage counter increases once in a while after operating the switch. This coencides with the slow updates in Zigbee2MQTT. Could we actually be looking at a hardware issue here?

Anyhow, I flashed 20221226 to see if anything was different, and by and large it was mostly the same on the hardware side. The switch is slow to react in software and the power outage counter increases. ~~The big difference between the verisons seems to be that 20221226 manages to recover better than 99240914. I can easilly see when the network goes down because some of my bulbs is starting to flash. With 99240914 it won't recover until I at least restart the container, and sometimes even have to replug the USB. 20221226 seems to get everything in order by itself. The bulbs flash for a few seconds and everything is up and running again.~~

Thank you for your help with this issue, by the way, I'm absolutely not able to make sense of any of this.

edit: spoke too soon, 99240914 also crashes proper.

Koenkk commented 9 hours ago

@Skeletorjus it looks like a hardware issue indeed, especially if only one of the SSM-U02 is affected.