Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.08k stars 1.68k forks source link

Unstable network, delayed actions, with sometimes no errors. #18196

Closed aurimasniekis closed 1 year ago

aurimasniekis commented 1 year ago

What happened?

I am running quite big ZigBee network right now around 150 devices, all of them are powered, mostly ikea lights, presence sensors, and few Sonoff usb3 E routers. At the moment I am running CC2652P2 Based Zigbee to PoE Coordinator 2023 with latest firmware, and I am was getting before some issues with NWK_TABLE_FULL errors, but those there rarely appearing.

This morning, I am started getting a lot of different errors:

Strangest thing, is that devices nearest the coordinator, are now barelly responding (lamp1 - lamp7). I tried restarting both coordinator and z2m but no difference.

I kinda dont even know where to begin look for the main issues, maybe someone is more knowledgable with this. I am also attaching my map gist, and info about the devices:

Total: 146

By device type:

Map

https://gist.github.com/aurimasniekis/75f74a7c82414271e639da1268cebc0f

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.32.0 commit: unknown

Adapter firmware version

20230507

Adapter

CC2652P2 Based Zigbee to PoE Coordinator 2023

Debug log

log.txt log (1).txt log (2).txt log (3).txt log (4).txt

aurimasniekis commented 1 year ago

Another new errors I am seeing right now:

It seems the issue is memory of cordinator right?

PetrMa commented 1 year ago

I have the same problem and errors last 2-3 days on my CC2531 with latest version of HA

Zigbee2MQTT:error 2023-07-03 14:38:07: Publish 'set' 'system_mode' to 'Topení Tomáš' failed: 'Error: Command 0x003c84fffec7f78d/1 manuSpecificTuya.dataRequest({"seq":5,"dpValues":[{"dp":106,"datatype":4,"data":[2]}]}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205))' Zigbee2MQTT:info 2023-07-03 14:38:07: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"Publish 'set' 'system_mode' to 'Topení Tomáš' failed: 'Error: Command 0x003c84fffec7f78d/1 manuSpecificTuya.dataRequest({\"seq\":5,\"dpValues\":[{\"dp\":106,\"datatype\":4,\"data\":[2]}]}, {\"sendWhen\":\"immediate\",\"timeout\":10000,\"disableResponse\":false,\"disableRecovery\":false,\"disableDefaultResponse\":true,\"direction\":0,\"srcEndpoint\":null,\"reservedBits\":0,\"manufacturerCode\":null,\"transactionSequenceNumber\":null,\"writeUndiv\":false}) failed (Data request failed with error: 'No network route' (205))'","meta":{"friendly_name":"Topení Tomáš"},"type":"zigbee_publish_error"}'

aurimasniekis commented 1 year ago

I have the same problem and errors last 2-3 days on my CC2531 with latest version of HA

Zigbee2MQTT:error 2023-07-03 14:38:07: Publish 'set' 'system_mode' to 'Topení Tomáš' failed: 'Error: Command 0x003c84fffec7f78d/1 manuSpecificTuya.dataRequest({"seq":5,"dpValues":[{"dp":106,"datatype":4,"data":[2]}]}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205))' Zigbee2MQTT:info 2023-07-03 14:38:07: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"Publish 'set' 'system_mode' to 'Topení Tomáš' failed: 'Error: Command 0x003c84fffec7f78d/1 manuSpecificTuya.dataRequest({"seq":5,"dpValues":[{"dp":106,"datatype":4,"data":[2]}]}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205))'","meta":{"friendly_name":"Topení Tomáš"},"type":"zigbee_publish_error"}'

Those are not the same errors, your device is not found on ur network, try to restart it

PetrMa commented 1 year ago

I have the same issue on all devices. Few minutes after every reboot of HA machine it works fine but after some time I´ve got these error for all devices in network.

lux73 commented 1 year ago

CC2531 Coordinators are old, weak and unstable

Pls upgrade to CC2652x Device to get a reliable Zigbee Network

thargy commented 1 year ago

Pls upgrade to CC2652x Device to get a reliable Zigbee Network

I see exactly the same thing on my CC2652P, after a few hours the whole network dies. And the the OP (aurimasniekis), clearly states his is also a CC2652P2 coordinator...

I too am close to ~150 devices (see this comment for full details).

Here's an example log:

error 2023-07-07 12:01:41: Error: CommandResponse 0x000d6f00174a7ddb/1 genOta.queryNextImageResponse({"status":152}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":1,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'Timeout' (9999))
error 2023-07-07 12:02:49: Error: CommandResponse 0x9035eafffed485c6/1 genOta.queryNextImageResponse({"status":152}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":1,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SREQ '--> AF - dataRequest - {"dstaddr":51197,"destendpoint":1,"srcendpoint":1,"clusterid":25,"transid":93,"options":0,"radius":30,"len":4,"data":{"type":"Buffer","data":[25,198,2,152]}}' failed with status '(0x10: MEM_ERROR)' (expected '(0x00: SUCCESS)'))
error 2023-07-07 12:02:56: Error: CommandResponse 0x040d84fffe2390fe/1 genOta.queryNextImageResponse({"status":152}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":1,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)
...

Following this, the coordinator needs power cycling.

thargy commented 1 year ago

And here's a clean reboot showing the infamous NWK_TABLE_FULL:

warn  2023-07-07 20:34:20: Failed to ping 'Dining Room Alcoves' (attempt 1/1, Read 0x8cf681fffe538e40/1 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205)))
warn  2023-07-07 20:34:35: Failed to ping 'Dining Room Table' (attempt 1/1, Read 0x8cf681fffe538daf/1 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 13769 - 1 - 9 - 0 - 1 after 10000ms))
warn  2023-07-07 20:34:45: Failed to ping 'Under Balcony Switch' (attempt 1/1, Read 0x540f57fffe218ea5/1 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205)))
warn  2023-07-07 20:35:02: Failed to ping 'Bedroom 1 Spot 2' (attempt 1/1, Read 0x001788010cf2df14/11 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205)))
warn  2023-07-07 20:35:26: Failed to ping 'Bedroom 1 Spot 9' (attempt 1/1, Read 0x001788010cf2da0e/11 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205)))
error 2023-07-07 20:35:26: Failed to read state of 'Bedroom 1 Spot 5' after reconnect (Read 0x001788010cf2d2db/11 genOnOff(["onOff"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":18174,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)')))
warn  2023-07-07 20:35:49: Failed to ping 'Bedroom 2 Spot 2' (attempt 1/1, Read 0x001788010cf356f6/11 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 61113 - 11 - 39 - 0 - 1 after 10000ms))
warn  2023-07-07 20:36:04: Failed to ping 'Bedroom 2 Spot 3' (attempt 1/1, Read 0x001788010ceeeb7b/11 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 9139 - 11 - 40 - 0 - 1 after 10000ms))
warn  2023-07-07 20:36:15: Failed to ping 'Bedroom 2 Spot 4' (attempt 1/1, Read 0x001788010cf3366c/11 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms))
warn  2023-07-07 20:36:26: Failed to ping 'Bedroom 2 Spot 5' (attempt 1/1, Read 0x001788010ceeefd7/11 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms))
warn  2023-07-07 20:36:38: Failed to ping 'Bedroom 2 Spot 6' (attempt 1/1, Read 0x001788010cf34c59/11 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms))
warn  2023-07-07 20:36:49: Failed to ping 'Bedroom 1 Switch' (attempt 1/1, Read 0x000d6f00174aa36c/1 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms))
aurimasniekis commented 1 year ago

In the object I am working with this network, I requested to route a few extra lan cables to try to split the network into 3 because I still need to add another ~50-100 devices...

image

thargy commented 1 year ago

In the object I am working with this network, I requested to route a few extra lan cables to try to split the network into 3 because I still need to add another ~50-100 devices...

Removing about half of my over-chatty Presence Sensors (TUYA ZY-M100-L) really stabilised things, so I've bought a second Sonoff Zigbee 3.0 USB Dongle Plus to connect to my spare Rasberry PI 4 and plan to set up Z2M on there and migrate all the sensors there. That way if the sensor network comes down my switches and lights won't be affected.

It does feel like an unnecessary work around though :(

haaslukas commented 1 year ago

any update on this? I'm also having a CC2652RB USB stick (with zStack3x0 and rev. 20221226) and never had problems the last two years. now suddenly my three Gledopto GL-D-003P lights are not working anymore and I'm getting the MAC channel access failure' (225) message:

Error 2023-08-03 22:57:17Publish 'set' 'state' to '0x84fd27fffe931468' failed: 'Error: Command 0x84fd27fffe931468/11 genOnOff.on({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'MAC channel access failure' (225))'

all other 22 zigbee devices seem to work.

any idea how I can debug this further?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days