Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.2k stars 1.68k forks source link

Devices keeps going offline. Error: No network route' (205) #21369

Open martinsheldon opened 9 months ago

martinsheldon commented 9 months ago

What happened?

Devices keeps going offline "randomly". Most frequently seen on Develco SPLZB-131 devices but also Hue bulbs/smart plugs, and Aqara door and temperature sensors.

I currently have 7 Develco SPLZB-131 (after removing a couple) that all seems to go offline and online randomly. Sometimes I can turn the device on and off in Z2M to wake it up, other times this is not working. Also powering on and off on the device it self or plugging it in and out may make it online.

When its not working I see this error in the logs: "No network route' (205)"

image

Other devices also has a similar behaviour like Phillips Hue bulbs and smart plugs but not so often. These devices also always (as far as I've noticed) responds to recalling scenes, even when they are offline.

Aqara door and temperature sensors also suddenly stops reporting. Sometimes I can wake them bu clicking the paring button one or few times. Other times I need to repair the device. I've tried pairing directly to the coordinator and to the nearest router but I don't see any difference in behaviour.

Currently I'm running Z2M as an addon in Home Assistant but I've also testet running it in docker on a Unraid system. No difference as far as I can tell.

I'm using the Slaesh's stick CC2652RB and have approx 120 devices. 75 router devices, 45 end devices. All spread around the house, coverage should be more than fine.

Two weeks ago I've also updated the FW on the coordinator but I see no difference.

Any idea what might be causing these issues? Its been pretty much stable for a long time but lately I've seen this behaviour all the time.

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.35.3-1

Adapter firmware version

20230507

Adapter

Slaesh's stick CC2652RB

Setup

Add on Home Assistant OS. HAOS running in a VM in Proxmox

Debug log

No response

stefsims commented 9 months ago

I have same issues, but only with IKEA bulps in a mixed environment (Aqara, Sonoff, HUE, IKEA.....) It was stable before 1.35.X Zigbee2mqtt in Proxmox and a Sonoff 3.0 P Controller and Z2M are up to date.

fschaal commented 9 months ago

Same here with Sonoff 3.0 P. I also have issues with devices leaving the network at random and becoming unknown. Resulting In the message: Entity 'xxxx' is unknown.

Controller and Z2M are up to date.

cromelex commented 9 months ago

I have the same issue with a Sonoff 3.0 P, on 1.35.0 , 1.35.1 and 1.35.3. Happens with firmware 20230507 and 20221226. Getting this error, or the (25).

The same thing happened either as the addon in HASS on a raspberry PI 4, and docker on a QNAP NAS.

I also have a mix of Ikea, Aqara, Sonoff devices (among others). Majority are Aqara and Ikea.

I was replacing a Conbee2 and after a while this just starts happening, on either machine. Reverting to the conbee2 (and repairing every goddam piece) fixes it, and this seems to imply the issue is specific to the sonoff. Not sure if it could be an actual defective sonoff? I ended up running 2 instances of zigbee2mqtt, as I have new Bosch Thermostats which are not compatible with the conbee2.

dubtec commented 9 months ago

I too have this. Started a couple of months back and doesn’t seem to correlate to anything. Tried downgrading firmware on Slaesh stick, moving it around, introducing IKEA repeater to lower burden on other devices. The issues remain (Pings fail, no network routes, no MAC ACK (rarely), devices leaving the network). Before it was rock solid.

Approx. 90 Devices in use are:

Definitely planning to get rid of the Aqara Relays, because they seem to loose their settings every 2-3 months for some reason. Resulting in weird behavior for the family members making them not trustworthy and endangering my hobby… 😉

brommetje commented 9 months ago

Stable for weeks and now same problem over here HA version 2024.2.1 Zigbee2MQTT 1.35.1-1 or 1.35.3-1 no difference. Using Sleash's CC3653RB stick firmware 20230507 Where can I download an older firmware ? Failed to read state of '0xd4aeee_Schuur_Smartplug' after reconnect (Read 0x588e81fffed4aeee/1 genOnOff(["onOff"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205)))? or Publish 'set' 'state' to '0xa3b03f_Kast_Achterdeur' failed: 'Error: Command 0x60a423fffea3b03f/1 genOnOff.on({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'MAC channel access failure' (225))' Also repairing fails?

martinsheldon commented 9 months ago

To follow up with some additional information after another 2 weeks of troubleshooting: I've order a new coordinator (same model) but no change after replacing the old one. Even tried with several different FW. Swapped extension cable, USB port on the PC. Disabled the USB autosuspend feature according to the Z2M FAQ. Reduced interference as much as possible close to the coordinator.

Last week I started rebuilding the entire Zigbee network using the new coordinator. I started with the smart plugs as they are placed around the house making a good starting point building the mesh. Then I added the nearby end devices like temp sensorse etc All seemed fine so I continued with all other mains powered devices excpect Phillips Hue as they all are members of group and scenes and required more work.

Still everything was good with no erros and paring with close to 100% success rate. No trouble at all. After a day or so with no errors I started migrating the Phillips Hue devices like bulbs and a couple of smart plugs.

I started with a couple of rooms and around 6-7 devices. Still OK. The day after I migrated around 25-30 devices and then the same issues/erros reapeared again: SRSP - AF - dataRequest after 6000ms MAC no ack NWK_TABLE_FULL BUFFER_FULL

Devices going offline (Mainly smartplugs and Hue bulbs). Some are able to wake up by toggling the device on/off in Z2M. Others require repairing.

At the moment I have about 10% of the devices offline.

Next step now is to move all Phillips Hue devices back to the original network running two in parallel to see what happens. I'm really running out of ideas what might be wrong here...

@Koenkk Any idea how to further troubleshoot on this issue?

Koenkk commented 9 months ago

@martinsheldon I would suggest switching to the 20221226 firmware: https://github.com/Koenkk/Z-Stack-firmware/tree/Z-Stack_3.x.0_coordinator_20221226/coordinator/Z-Stack_3.x.0/bin

martinsheldon commented 9 months ago

Thanks @Koenkk, I'll try that. Any reason that version should work better than the latest?

I've tested with a few different but lost track of exactly which one. (I know I tried the latest) Did not see any difference for any of the FW's

Also as I've mentioned I started migrating the Hue's to a separate coordinator/network. Not all are done but it does indeed seems a lot more stable but yet not perfect.

Network with only Hue's show no errors at all, the other with a mix of device types still show a few but error reduced by maybe 90-95% percent

Koenkk commented 9 months ago

@martinsheldon 20230507 is not stable in some setups, working on a new fw already

RStadlmair commented 9 months ago

And I thought I am alone with this problem ..

Same here, pretty similar with mini-Test setup hue motion sensor and lightstrips/bulbs.

Will check FW version of my Sonos adapter. Hope that this gets fixed, as it makes my HA unusable.

Happy to provide logs and do tests if required by development.

martinsheldon commented 9 months ago

@martinsheldon 20230507 is not stable in some setups, working on a new fw already

Okay, I was not aware of that. Might not be able to try this FW until Thursday but I'll update when this is done and its been running for some hours to see if there still are failing devices

stefsims commented 9 months ago

20221226 does not fix the issue for me. Some lights still goes offline. Mostly all the IKEA Floalt and some IKEA GU10 and E27. Hue bulbs are stable...

dubtec commented 9 months ago

I had downgraded to 20211217 and spent approx. 3-4 weeks on it but that didn’t improve things for me. Still the same behavior.

fschaal commented 9 months ago

I've tried lots of things: upgrading to 20230507 and downgrading to 20211217 but what fixed it in the end was simply downgrading z2m to version 1.34.0-1 now all is back to normal for me.

martinsheldon commented 9 months ago

I've tried lots of things: upgrading to 20230507 and downgrading to 20211217 but what fixed it in the end was simply downgrading z2m to version 1.34.0-1 now all is back to normal for me.

Thats interesting as the oldest version I have backup of is the same but for me it was no difference at all.

fschaal commented 8 months ago

I have to come back on this. In recent days some devices have become unstable again. Some not being able to be reached at all others working again after reconfiguring them.

RStadlmair commented 8 months ago

I reboot zigbee2mqtt daily at 23:30 in Homeassistant for diagnosis. Afterwards automations work for a few hours and then suddenly stop (I know that bcs. I have a movement sensor triggered nightlight in the sleeping room).

Anything I can help with (logs, tests I shall do)?

srett commented 8 months ago

I've been testing a lot of things over the past days/weeks as I'm affected by this too. A lot of devices go offline either completely, or can still send data to the Coordinator, e.g. temp updates on a TRV, but sending any command to the TRV results in the dreaded "No Network Route (205)" error. Re-pairing solves the problem temporarily. This sounds like it could be similar to the problem described in #1408, i.e. the Coordinator not knowing the current route to the end device. It's just that that issue is 4 years old and for a different adapter.... I'm using a CC2652p.

The network is more stable with 1.33.x for me, but still not perfect, it's just that it takes days instead of hours for the network to deteriorate. I don't really get why though, AFAIU the coordinator firmware is doing all the bookkeeping and route discovery, so it's weird that the z2m version appears to have an impact.

Looking forward to any new firmware to test.

samuele2723 commented 4 months ago

Hello there, following the topic. I am having same issue with updated firmware ztack and version z2m 1.39, tomorrow i'll try downgrades and report results here. Meanwhile any found more clear correlation?

cromelex commented 4 months ago

Which coordinator?

I am still convinced it could be an actual hardware issue with the coordinator. Once I replaced the coordinator the issue was gone and has never showed up again.