Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.23k stars 1.69k forks source link

QS-Zigbee-CP03: intermittent error: 'No network route' (205) #12774

Closed cunode closed 1 year ago

cunode commented 2 years ago

What happened?

When communicating with a QS-Zigbee-CP03 curtain module I frequently get the following error message: error 2022-06-10 07:01:34: Publish 'set' 'state' to 'Store SZ Sued' failed: 'Error: Command 0xa4c1385b260f0939/1 closuresWindowCovering.upOpen({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205))'

Usually, when resubmitting the same command multiple times, the device finally executes it as expected. But there is a 50% chance to fail and hence my automation is absolutely unreliable.

Based on the finding in #11539 (QS-Zigbee-CP03) I switched on May 13, 2022 to the Zigbee2MQTT dev branch for Linux. I also use the latest firmware for my SLAESH coordinator (CC2652RB_coordinator_20220219.hex) together with a 3 dB antenna connected to a Raspberry Pi 4 running Debian GNU/Linux 11 (bullseye).

What did you expect to happen?

Communication should be reliable as there are short distances to the coordinator (~6 meter) or the next router (~3 meter). The link quality in the Zigbee2MQTT Map never shows zero and the device's Availability is shown as "Online". Any help is highly appreciated.

How to reproduce it (minimal and precise)

Problem is intermittent and hence cannot reproduce with clear steps.

Zigbee2MQTT version

Zigbee2MQTT version 1.25.1-dev (commit #bc5fd1a5)

Adapter firmware version

CC2652RB_coordinator_20220219.hex

Adapter

CC2652RB development stick

Debug log

No response

Logima commented 2 years ago

I might be affected by this issue. Currently I have one lamp which doesn't respond to commands (Paulmann 371000002, 0xe29c). Reconfiguring seems to succeed, but doesn't affect anything. Network map shows it has connection to three other routers (0x8256, 0x173e and 0xaa46) with decent LQI, but no direct connection to coordinator. This is a bit odd, as this lamp is the nearest device to coordinator.

Earlier I had similar case with another lamp, which solved by itself when the lamp got direct connection to coordinator again.

Here is a quick sniff of sending a command to 0xe29c until Z2M shows:

Publish 'set' 'state' to 'tyohuoneen_valo' failed: 'Error: Command 0x00158d000581bad3/1 genOnOff.off({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205))'

no-network-route.zip Network key: 01030307010406020008090100030501 Coordinator is Sonoff ZBDongle-P, running CC1352P2_CC2652P_launchpad_coordinator_20221102. Z2M version 1.28.1.

slashbv commented 2 years ago

I have the same issue with QS-Zigbee-CP03. I didn't manage to find a solution so far. image

sk8er000 commented 2 years ago

After a quite long period I got this error again and this time on a TRÅDFRI outlet, this is the first time I notice this error on a IKEA device. The device is literrally behind the wall near to the zigbee antenna

bogd commented 2 years ago

[ Editing my comment, because it seems to be unrelated. Even though the symptoms seemed similar, in my case it apparently was due to issues with the Zigbee2MQTT database disk - latency and failed writes. After moving the disk to different hardware (read: SD card :) ), the network seems to have stabilized. I must admit I never expected Zigbee2MQTT to be sensitive to disk performance....

Leaving the rest of the text here for reference, and because I do not wish to confuse anyone else reading the thread ]

Possibly similar issue - after months of things working properly, I keep seeing devices going offline and/or responding to commands with huge delays (on the order of seconds). In the web interface, I get "failed to ping / no network route" errors.

Z2M: docker image, image ID bf4576703bf7 Sonoff ZigBee 3.0 USB Dongle (Coordinator firmware version: '{"meta":{"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20210708,"transportrev":2},"type":"zStack3x0"}') RaspberryPI 4 Model B 8GB MQTT server and HomeAssistant on separate machines, the RasPi is dedicated to Z2M

So far, I have seen the issue with:

  • Ikea Tradfri bulbs (multiple types)
  • Ikea Tradfri switched outlets
  • Ikea Tradfri switches (E1743)

Some of the devices did get a firmware update recently (firmware update 2.3.095), but I am also seeing the issue on devices that have not been updated.

There are multiple other devices (mainly tuya/sonoff) in the network, but so far the problem seems to be limited to the Ikea devices.

tvdbroeck commented 2 years ago

I had the same problem after connecting my 1 gang dimmer from Lonsonho/Girier (TS110E) to my zigbee network. Z2M received updates FROM the dimmer, but I could not send commands TO the dimmer. After creating a dummy group with 1 device, my dimmer, I can send commands to the group like I would to the dimmer. This seems to work fine now.

cunode commented 2 years ago

The workaround with the dummy group seems to work. I set them up 2 months ago and since then my QS-Zigbee-CP03 curtain module works without problems. This proves that my setup does not have a hardware nor a device firmware problem nor a weak network. Instead, the Zigbee2MQTT (version 1.25.1-dev) seems sending commands to individual devices differently than grouped devices.

toxic0berliner commented 2 years ago

I have the same issue using a TS130F QS-Zigbee-C03 using Z2mqtt v1.28.2 and a sonoff USB stick. The group workaround also works for me. So I'm also suspecting it's something with Z2mqtt itself more than the USB dongle or the device itself. Whenever it's unresponsive with the no route errors it is still responsive to the cabled switch and Hass is still recieving the current status of motor moving and actual curtains position. Sending commands to the group does not "revive" the device itself for me, the device responds properly to the command sent to the group but the only way for me to get the device control back without the group workaround is to switch off then on the breaker for the curtains, actually rebooting the modules. But after a few hours the same behavior comes up.

drknexus666 commented 2 years ago

Possibly same issue, using zigbee2mqtt-git 1.28.2, CC2652P (Zigstar Stick v4) running CC1352P2_CC2652P_launchpad_coordinator_20220219. Have lots of routers on network. Devices seemingly randomly start going offline or stop accepting commands with 'No network route' (205) errors. Sometimes repeatedly sending a command will bring the device back to functioning. Usually group commands will still work and they will sometimes bring the device back to normal function but not always. Resetting power to all devices will cause the network to work without problem for ~1day. Wireshark capture with everything working at first then things start failing. Some examples, 0xB595, 0xA4AD, 0x0421C. WorkingThenEventuallyNoRoute.zip No network key set in configuration.yaml, should be default. Network key: 01:03:05:07:09:0B:0D:0F:00:02:04:06:08:0A:0C:0D

Koenkk commented 2 years ago

@drknexus666 can you provide me your network key so that I can decrypt the sniff?

toxic0berliner commented 2 years ago

Seems drknexus666 provided it. I'm not savvy enough to produce such a sniff yet but with a bit of help I can most probably learn by myself. Reading on z2m I believe that the group workaround works because the message gets broadcasted and any group member is responsible to act on it while the message to the device itself needs to be "routed". I haven't tested with another software than z2m, did someone confirm it is not happening with ZHA for instance or any other bridge?

galambert75 commented 2 years ago

did someone confirm it is not happening with ZHA for instance or any other bridge?

Yep: https://github.com/Koenkk/zigbee2mqtt/issues/12774#issuecomment-1244504418

Still running without a glitch.

toxic0berliner commented 2 years ago

Thanks for the reprot @galambert75 ! Gives me hope proving that my sonoff USB is not at fault, neither are my curtains module of which I purchased 10, wouldn't like to have to change them ;) Will purchase another sonoff USB to switch them to ZHA and verify it's also running smooth faulting only z2m even if it's a cool software product ;)

Koenkk commented 2 years ago

@drknexus666

@galambert75 Since it works for you with ZHA, is this with the same coordinator and firmware?

toxic0berliner commented 2 years ago

I'm running the latest firmware CC1352P2_CC2652P_launchpad_coordinator_20221102.zip on my Sonoff USB Dongle P and I'm having the issue (but no way to give you a sniff I fear that requires dedicated hardware). I needed to use my curtains so I moved to ZHA and also confirms they work fine just fine now. But I'm feeling like I'm going against the flow here, I'm failry technical and liked the more detailed actions of z2m vs ZHA... I'll be a week before the new dongle I just purchased arrives and I can test further with Z2M as I can't be switch between ZHA and Z2M constantly, but I certainly can bring Z2M back up in about a week and setup one of my curtains on it to continue testing/helping.

And for me it works with ZHA with the same coordinator & firmware ;)

sk8er000 commented 2 years ago

@drknexus666

@galambert75 Since it works for you with ZHA, is this with the same coordinator and firmware?

Is the 20221102 the last stable firmware? Since I was thinking that the last stabile one was 20220219 I made and making all the test with this one. If it's a new stable FW I'll gladly update and make all the tests again

toxic0berliner commented 2 years ago

my mistake, sorry guys, I'm running CC1352P2_CC2652P_launchpad_coordinator_20220219.zip, wrong copy paste didn't realize 20221102 is the one on dev branch. I just double-checked and I'm indeed running 20220219 which is the only one on master branch, 20221102 is in dev branch

Koenkk commented 2 years ago

@sk8er000 20221102 is not in stable yet but it is worth trying it out

lucianojss commented 2 years ago

@sk8er000 20221102 is not in stable yet but it is worth trying it out

I tried with the develop drivers 20221102 but unfortunately didn't solve it for me.

broekd commented 2 years ago

@Koenkk iam running into the same issue that certain switches lose their network connection. Repairing will get them running for a small time. Iam very interested to get this analyzed because I moved into a new house and are automating things but run into this stability issue.

The good things is iam live very closeby and have time to setup sniffing etc. Can I be of any help?

toxic0berliner commented 2 years ago

If someone is interested, I think I have found my final workaround : I switched my main zigbee coordinator to the Sonoff USB Dongle-E (ezsp radio), I know it's only experimental in z2m but it works fine for all my devices (27 of them). It was working flawlessly for about a ull day which didn't happen with the sonoff dongle-P using Z2M. The dongle-P was working with ZHA as others also confirmed.

Now another hint is that I converted my dongle-p into a router and that improved my network quite a lot (easyer to pair devices that were far from coordinator, reliable reporting of state for these devices, former issues like duplicated zigbee commands disappeared...) So I suspect this issue is also due or at least worsen by lower network quality.

slashbv commented 2 years ago

I have over 80 devices and all work well except those QS-Zigbee-CP03. There are 2 routers about 2 meters away so for me it is not a network problem. I have also a few QS-Zigbee-CP01 which are zigbee 2.0 and they work well too.

broekd commented 2 years ago

If someone is interested, I think I have found my final workaround : I switched my main zigbee coordinator to the Sonoff USB Dongle-E (ezsp radio), I know it's only experimental in z2m but it works fine for all my devices (27 of them). It was working flawlessly for about a ull day which didn't happen with the sonoff dongle-P using Z2M. The dongle-P was working with ZHA as others also confirmed.

Now another hint is that I converted my dongle-p into a router and that improved my network quite a lot (easyer to pair devices that were far from coordinator, reliable reporting of state for these devices, former issues like duplicated zigbee commands disappeared...) So I suspect this issue is also due or at least worsen by lower network quality.

I thought that the dongled was rulled out to be the issue because the same stick does not cause issues in HZA So don't understand how then switching to the E version solves the problem

toxic0berliner commented 2 years ago

Neither do I @broekd . But so far it holds up... I'll report back if it changes after a few more days

clemencov commented 1 year ago

I'm not sure if this is related, but for the last couple days on 1.28.4 it works without an issue.

alex-bristol commented 1 year ago

I also have same issue, Zigbee2MQTT stopped working after moving HA from SD card to SSD on USB port on Raspberry Pi 4. I needed to get the system working again so I put the old SD card back into Raspberry and booted, Zigbee2MQTT all worked fine again. Therefore for my system this proves the hardware is not the problem. For the migration from SD to SSD I followed the guide from the nice chap on Everything Smart Home Youtube: https://youtu.be/QxtDyMbDOh4 My versions of HA and the add-ons on SD card are May 2022, so HA for instance is "core-2022.5.2". This migration was done in the last few days, and the process done twice, the result was the same, Zigbee2MQTT stopped working after the migration.

slashbv commented 1 year ago

@alex-bristol do you have QS-Zigbee-CP03 devices and they stopped working or your entire network is down after migration ? This issue is about QS-Zigbee-CP03 devices.

alex-bristol commented 1 year ago

@slashbv, I don't have curtain relays, QS-Zigbee-CP03, thank you for pointing that out, my oversight, but my results I believe are still relevant to this issue, unless you know a better post I should be using. I have 15 Zigbee devices mainly things for heating so Tuya radiator valves, Sonoff temperature sensors, some Tuya UK plugs to control solar things. I didn't check all 15 devices but the 5 devices I did check failed after upgrade, and then worked again after disconnecting SSD and re-booting via SD card. With my brief tests it looks to me like the entire Zigbee network is down, receiving and sending, but my testing is not exhaustive.

toxic0berliner commented 1 year ago

I think you should open a new issue, in this one we seem to have narrowed down to this specific CP03 device and Sonoff dongle P running z2m.

I can also confirm that now with Sonoff dongle E and z2m 1.28.4 outside a few unavailabilities here and there my 10 CP03 seem to work fine and no more 403 route errors.

alex-bristol commented 1 year ago

@toxic0berliner, yes I see your point, thanks.

seth12 commented 1 year ago

I have the same problem my QS-Zigbee-CP03 cant send command but they send data as i can view the moving state when i use wall switch. Im using a z2m and Sonoff ZigBee 3.0 USB Dongle P.

Any idea what can be the solution i already notice than i can par devices to QS-Zigbee-CP03 they failed the interview stage.

toxic0berliner commented 1 year ago

Had the same issue with my Sonoff dongle P. I switched to Sonoff dongle E that has experimental support on z2m and it's now running like a charm for weeks. Turned mu dongle P into a simple router that also stabilized other devices that seemed too far.

If that gives hints, I suspect the router function of these CP03 devices isn't working properly, I can't join device to my network by putting a CP03 in accept mode even when the device are close. Cheap curtains module 😁 but now with my dongle E they work just fine to control the curtains, I'll live with them even if they don't fulfill the router function well.

marcocunha commented 1 year ago

Following this thread because I am having the same issue. Any updates?

seth12 commented 1 year ago

@toxic0berliner did your CP03 still running with no issue? One thing i notice if i set an automation to close all cover on sunset they all work even if i cant controll them manually

slashbv commented 1 year ago

@mpuff all your router devices are QS-Zigbee-CP03?

slashbv commented 1 year ago

@mpuff this issue is about QS-Zigbee-CP03 curtain switch.

timderspieler commented 1 year ago

@Koenkk are there any news on this issue?

Koenkk commented 1 year ago

I need a sniff of this problem to debug it further: https://www.zigbee2mqtt.io/advanced/zigbee/04_sniff_zigbee_traffic.html

magno-santos commented 1 year ago

Also facing same issue. I don't have the means to implement the sniffer.

Not sure if network quality is an issue, as sometime it occurrs with devices closer to coordinator than others.

galambert75 commented 1 year ago

I doubt it's a network issue. Everything was fixed on my end after I migrated to ZHA (same firmware and hardware).

https://github.com/Koenkk/zigbee2mqtt/issues/12774#issuecomment-1324046591

Also facing same issue. I don't have the means to implement the sniffer.

Not sure if network quality is an issue, as sometime it occurrs with devices closer to coordinator than others.

timderspieler commented 1 year ago

I doubt it's a network issue. Everything was fixed on my end after I migrated to ZHA (same firmware and hardware).

#12774 (comment)

Also facing same issue. I don't have the means to implement the sniffer. Not sure if network quality is an issue, as sometime it occurrs with devices closer to coordinator than others.

The situation for me is the same. I have multiple routers on each floor with a good link between 100 and 140. The issue is that the cp03 only connects to the coordinator. I have one CP03 which sits around 3 meters away from the coordinator. It works flawlessly since day 1. The other one, which is on the 2nd floor perfectly reports its current state but does not allow to be controlled.

I will buy another zigbee stick to create a second zigbee network which is getting used with zha. @Koenkk are there any "premium" CC2531 sticks out there? I am planning on getting a second Sonoff Dongle P which I can then use as a router after I have sniffed the network.

magno-santos commented 1 year ago

I doubt it's a network issue. Everything was fixed on my end after I migrated to ZHA (same firmware and hardware).

#12774 (comment)

Also facing same issue. I don't have the means to implement the sniffer. Not sure if network quality is an issue, as sometime it occurrs with devices closer to coordinator than others.

I was on ZHA and moved to Zigbee2mqtt to see if it solves the issue. It does not, but now seems more reliable and devives not recognized on ZHA (like smoke detectors, garage openeer) now works OK.

seth12 commented 1 year ago

I need a sniff of this problem to debug it further: https://www.zigbee2mqtt.io/advanced/zigbee/04_sniff_zigbee_traffic.html

Can i do it with the coordinator only have one stick?

timderspieler commented 1 year ago

I need a sniff of this problem to debug it further: https://www.zigbee2mqtt.io/advanced/zigbee/04_sniff_zigbee_traffic.html

Can i do it with the coordinator only have one stick?

No. You need another stick (With the CC2531 Chipset) that has a special sniffer firmware installed.

IlGiock commented 1 year ago

same issue here. I have noticed that this problem is mostly in the end devices (i have 13 moes hy368). it seems that they fall into a sort of sleep mode and only by pressing the soft keys the devices go back to transmitting their state. in addition, the automations involving the thermostatic valves have some strange problems: I have set an automation that changes the preset mode and the temperature at certain times but it happens that some devices receive the instructions, some don't and others seems tonrecive a malformed message because they change the preset mode but not the temperature or vice versa they change temperature but not the preset mode... all in absolutely random mode. i have hassos on a generic mini pc and i use the sonoff usbstick-p. I've been having this problem for a couple of months and I was going crazy, luckily I found this thread. if you confirm that everything works regularly with the sonoff usbstick-e, i will buy it. Thank you

timderspieler commented 1 year ago

same issue here. I have noticed that this problem is mostly in the end devices (i have 13 moes hy368). it seems that they fall into a sort of sleep mode and only by pressing the soft keys the devices go back to transmitting their state. in addition, the automations involving the thermostatic valves have some strange problems: I have set an automation that changes the preset mode and the temperature at certain times but it happens that some devices receive the instructions, some don't and others seems tonrecive a malformed message because they change the preset mode but not the temperature or vice versa they change temperature but not the preset mode... all in absolutely random mode. i have hassos on a generic mini pc and i use the sonoff usbstick-p. I've been having this problem for a couple of months and I was going crazy, luckily I found this thread. if you confirm that everything works regularly with the sonoff usbstick-e, i will buy it. Thank you

I've bought and installed the Dongle E already. I cannot confirm the things @toxic0berliner wrote. I've tested the Sonoff Dongle E standalone with the original Zigbee HA integration. The issue is that the Z2A Network got so bad after I installed it that every device in that network had issues reporting or receiving commands. The Roller Shutter also didnt work at all (or just for a few minutes just like in the Z2A integration). What I dont like about ZHA is that you cannot set an opening state to like 50 or 80%. Its just open or closed. Furthermore I can't calibrate the moving times.

That being said I've removed the ZHA integration again and installed the router firmware on the Dongle E. Now its a simple repeater which I have plugged into a phone charger in my basement. I haven't tested the CP03 yet. I will try it tomorrow.

But I have to say that I don't put much hope into it. Theres a general issue with these tuya devices and we definitely need someone with a network sniffer. @Koenkk asked already but we can't find anybody who is willing to do it (or better said can do it).

Entepotenz commented 1 year ago

Hi @Koenkk ,

I sniffed the zigbee network and created two samples for failing turning on the light switch. What is the next step for the two pcap I created?

I collected the traffic for those two devices:

Koenkk commented 1 year ago

@Entepotenz I need the pcap file, data/database.db entry of that device + your network key. The pcap file should start at the moment where the switch is still working fine (controllable via Zigbee2MQTT) until it stops working.

Entepotenz commented 1 year ago

@Entepotenz I need the pcap file, data/database.db entry of that device + your network key. The pcap file should start at the moment where the switch is still working fine (controllable via Zigbee2MQTT) until it stops working.

This will take some time to get right because the problem reappears and disappears without any pattern. It might be the case that the pcap file will have a lot of data (e.g. multiple days). If this is not a problem for you diagnosing the potential problem I will start sniffing my zigbee traffic.

Entepotenz commented 1 year ago

@Koenkk Are two separate pcap files sufficient? One for the same switch working by interacting with zigbee2mqtt and one for same case but not working.

This would be faster and easier for me to create and the amount of data to analyse is smaller.

Koenkk commented 1 year ago

@Entepotenz no, I need to see what happens in between as it probably goes wrong there.