Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge πŸŒ‰, get rid of your proprietary Zigbee bridges πŸ”¨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.5k stars 1.63k forks source link

why do some devices randomly disconnect? #2705

Closed prankousky closed 4 years ago

prankousky commented 4 years ago

Hi everybody,

I get random disconnects from devices that had previously been paired and working just fine with zigbee2mqtt. How can I avoid these in the future? I use a single Texas Instruments CC1352P-2, but had issues like this back when using a CC2531 coordinator with multiple CC2530 routers as well.

Debug Info

zigbee2mqtt version: 1.8.0 CC253X firmware version: {'type': 'zStack3x0', 'meta': {'transportrev': 2, 'product': 1, 'majorrel': 2, 'minorrel': 7, 'maintrel': 1, 'revision': 20191106}}

Bug Report

What happened

Devices will suddenly not update their status any longer. I am using Home Assistant and zigbee2mqttassistant; example device is a Xiaomi door/window sensor image image

It does work at the moment; however, it stopped worked last night. I remotely restarted the server running zigbee2mqtt as well as Home Assistant etc., but no success. This morning, I took the sensor, pressed the button 5 seconds, and it seems to have paired successfully, as it works again.

Here is another sensor that currently does not work. However, it had worked before, and it is the same Xiaomi door/window sensor as the others, so it should be known by zigbee2mqtt (which it usually is; when I first paired it, it was displayed without any errors and as the kind of sensor it actually is).

image image

This is my configuration.yaml; I am using the npm version now, but had used docker as well, which seems to have had a slightly worse performance than npm

homeassistant: true
permit_join: true
mqtt:
  base_topic: zigbee2mqtt
  server: 'mqtt://url:1883'
  user: very_mqtt
  password: much_secret
  client_id: z2mbeelink
  include_device_information: true
serial:
  port: >-
    /dev/serial/by-id/usb-Texas_Instruments_XDS110__02.03.00.18__Embed_with_CMSIS-DAP_L430019B-if00
  disable_led: false
advanced:
  pan_id: my pan id
  network_key:
   - my network key
  homeassistant_discovery_topic: homeassistant
  homeassistant_status_topic: hass/status
  last_seen: ISO_8601_local
  elapsed: true
  report: true
  log_level: debug
  availability_timeout: 30
  cache_state: true
queue:
  delay: 250
  simultaneously: 15
ban:
  - '0x14b457fffe779019'
devices:

My current "fix" is to manually re-pair each device that randomly disconnects. While this works, it also means I have to have permit_join: true set at all times, because otherwise this will (or might) not work. Well, I could always manually set it via mqtt, but unless I am wrong, I feel like leaving it true seems to give an overall better connectivity.

I had messed around with those queue parameters, but could not figure out what would give better/worse performance, or if it made a difference at all in my setup.

This is a screenshot of my current network map status image

Pretty much at the top there is Flur_Tuer, which is a door sensor. It is pretty close to the coordinator, and it is successfully paired (and sending updates), so I am not sure why that device (and many others) just "float" there without displaying a connection - the connection is there. Then again, the Haustuer_Button button (Xiaomi as well) is connected according to network map image, but it did not work this morning when it was pressed (we use it as a doorbell). So I don't know how accurate the map is.

I do rely on my sensors (well, most of them ;)) for automation, so I cannot always have them disconnect. What did I do wrong in my setup? Thank you for your help :)

kisseler commented 4 years ago

I can confirm the problems @prankousky describes. I have a bunch of sensors and actors (smart sockets, hue bulbs etc.) and connect them via rules in my openHAB setup.

I am facing regularly disconnecting and not (directly) re-joining devices. Especially when wife and kids try to turn the lights on and off or parents in law want to control their heating system.

The acceptance factor gains rather lower with each issue appearing. Is there any guidline to improve stability and reliability? I already switched from CC2531 with external antenna to Texas Instruments CC1352P-2 and I added another 6 CC2531/CC2530 routers in a range of 350 sqaure meters in total.

I'm loving the zigbee2mqtt project, but I did not face so many connectivity issues with the Xiaomi Gateway and my Homematic Devices, which I was using before. And before there were only one central gateway - no coordinator/routers network.

Thanks for sharing your experience and useful guidance for improving stability, reliability and acceptance/fun factor.

jonathanmusto commented 4 years ago

Same problem here, I'm runnnig a CC2531: Coordinator firmware version: '{"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}}

I guess issues like this will always be hard to diagnose as there can be many contributing factures, namely other WiFi signals/channels close by.

I have 36 devices, many of which are routers (4 tradfri outlet, 10 trafri bulbs), so I'm debating switching to the source routing firmware to see if that will improve things. If i switch default -> source routing, will I have to re-pair all my devices. If things don't improve / get worse, can I switch back without repairing?

Is there any advantage to adding CC2531 devices as routers vs router devices (tradfri bulbs / sockets)?

prankousky commented 4 years ago

but I did not face so many connectivity issues with the Xiaomi Gateway

I am considering buying one of those just for the Xiaomi sensors. Though that might cause range problems, as the Xiaomi gateway will not use my Hue and Innr light bulbs as repeaters as zigbee2mqtt does, correct?

Is there any advantage to adding CC2531 devices as routers

I have tried using multiple router with the CC1352P-2, but while the routers connected fine, it seemed to me like all end devices did not connect well. I should mention though that when I attempted this, I had been messing with range issues all day and might just not have been patient enough for all devices to properly connect - however, that would usually take a few minutes tops after a restart, and this period of time was exceeded when I attempted using additional routers.

Venice-89 commented 4 years ago

Hi,

maybe your xiaomi sensors are connected to a device which isn't online all the time? For example a bulb which is used with a regular switch?

The aqara devices don't change the device with which they are paired. If you turn off that router or move the router out of range, the paired aqara stop working !

I'm facing that problem with my aqara motion sensor.

kisseler commented 4 years ago

maybe your xiaomi sensors are connected to a device which isn't online all the time? For example a bulb which is used with a regular switch?

That's a good hint, but not valid for me, because there are no bulbs or other router devices I disconnect phisically.

The aqara devices don't change the device with which they are paired.

I've learned that the ZigBee network is self-repairing over time. So some devices are excluded from that functionality?

@prankousky Repeaters are only functioning in the networks they are connected to. So the Xiaomi gateway won't use routers from other networks. But as I said: The range (und stability and reliability) was far higher than my current ZigBee network consisting of CC1352P-2, CC2530 routers and a few other devices that works as repeater.

This morning, once again an Eurotronic Spirit thermostate left the network, so the room stayed cold. There are three other devices (same type) in the room working just fine, so range cannot be the problem. I'm even considering the change back to my old Homematic setup.

Is there any hint, how the system might be stabilized? Maybe regularly restarting or automatically repairing?

Another rather frequent issue is response time: Especially light switches are inconvenient, when the reaction of the bulb follows a few seconds too late. Finding the problem is rather complicated, because the delay might be caused be any link of the ZigBee<->ZigBee2MQTT<->Broker<->OpenHab chain. There was the idea of impementing basic rules in this post. Maybe it could be nevertheless useful to mark and connect standard binding functionality without using the direct binding of devices. That would be very helpful and avoid detours that can be responsible for unwanted delays.

I really would like to increase the acceptance factor of unconcerned family members...

prankousky commented 4 years ago

maybe your xiaomi sensors are connected to a device which isn't online all the time? For example a bulb which is used with a regular switch?

In fact, I have one light bulb upstairs (close to that window sensor) that sometimes disconnects on it's own as well. It is always powered, though. Sometimes when I try to control that bulb, I get "unavailable" and two minutes later I can control it again. But that should not happen either, right? At least not when I don't switch the lamp it is connected to (as it always has power this way).

This morning, once again an Eurotronic Spirit thermostate left the network, so the room stayed cold.

I use those, too. And they behave even more strange. Some of them will not allow me to change temperature at times; but when I turn them from heat to off, this will work, and when I manually switch temperature on the device, it will report the state via zigbee (I can see the change in Home Assistant). Yet when I change temperature in Home Assistant, it will not accept this change and just stay at whatever it had been at before. The only way to fix this seems to be at least removing the battery and inserting it again; if that doesn't work, resetting and re-pairing will work, but if I have to do that every other day, I might as well not use it at all. Is there a dedicated hub for these Spirit thermostats? As long as it works locally (no cloud / registration required) I wouldn't mind buying a hub per company (one Xiaomi, one Eurotronic, etc.) if it would allow me to reliably work with all these devices.

I have considered automatically restarting zigbee2mqtt at, for example, 4:00 every morning. I believe this works with CC1352-P and npm, but at least when I ran docker with CC2530and CC2531, restarting the container would result in nothing working at all until I manually stopped it, disconnected the coordinator, connected it again, then manually started again. I'll have to see if an automated restart would work over time, or if rooms would be cold in the morning as well because the radiators wouldn't be controlled until I manually re-paired them.

Oh and I don't have any direct bindings. Everything goes device => zigbee2mqtt => mqtt broker => Home Assistant (and then perhaps => mqtt broker => zigbee2mqtt => back to device).

Venice-89 commented 4 years ago

I've learned that the ZigBee network is self-repairing over time. So some devices are excluded from that functionality?

Yes, normaly a device will change to another parent if the connection is lost. I am not sure why some aqara devices don't do that. But it's a known issue with aqara... I don't know if it's a firmware issue or it's caused by Z-Stack/z2m. I don't have a aqara bridge so I can't test it in the original ecosystem.

Is there any hint, how the system might be stabilized? Maybe regularly restarting or automatically repairing?

prankousky commented 4 years ago

Try to connect the stick via a USB cabel to your PC. Some devices can cause interferences (for example the Raspberry PI4)

The CC1352-P is already connected via USB cable (it is not a stick but a board); it's antenna is about 2 meters away from the device where I expect better coverage (up higher and a bit further away from the computer it is connect to).

Just now I got this

zigbee2mqtt/bridge/log {"type":"zigbee_publish_error","message":"Publish 'set' 'system_mode' to 'Kueche_Heizung' failed: 'Error: Timeout - 6584 - 1 - 89 - 513 - 4 after 10000ms'","meta":{"friendly_name":"Kueche_Heizung"}}

When I manually control that device (Eurotronic Spirit radiator thermostat), it will still publish the actual value, but when I try setting it, it will give this error. So it is connected, but not working right (instead of being disconnected all together; in that case it wouldn't be able to report it's new status, either).

WeeBull commented 4 years ago

I am also having this problem with permanently powered Ikea bulbs. Currently one is in a set of three in the living room, and one is in a set of five in the hall. So both are within a metre or two of multiple other repeaters. Both still appear on the network map.

I'm using a CC2531 running stack 1.x (about a month old).

kisseler commented 4 years ago

@prankousky The Eurotronic Spirit thermostates sem to have different hard- and software. Some were really irreliably so I send them back to Voelkner (seller) and exchanged them by a different (newer?) model.

I also noticed a big diffrence in noise production. Some were so loud, they could not be used in living room or even sleeping room environment. And I still have one in the living room that permantently "annouces itself" in the zigbee network. I suspect this is battery killing, so I will exchange that as well soon.

Shred99 commented 4 years ago

I have a similar problem with Xiaomi sensors disconnecting. The network consists of a CC2531 coordinator, a CC2530 router and 18 battery powered Xiaomi end point devices.

There are about 5 devices that intermittently disconnect. All of these devices are outside the range of the coordinator, but should have a reasonable signal from the CC2530 router.

I've found that simply bringing the failing sensor close to the CC2531 coordinator and doing a quick press on the sensor's button usually fixes the problem. Since upgrading to v1.8.0, I've had this happen once - and simply putting the unresponsive sensor near the coordinator and leaving it there for a few hours brought it back to life.

Installing a decent quality 9dB aerial on the coordinator seems to have helped, but has still not eliminated the problem. Beware that the aerial pushes more signal out horizontally, so it will make matters worse if your problem is getting signal upstairs in a two story house. https://www.rfsolutions.co.uk/antennas-c8/rugged-high-gain-antenna-for-2-4ghz-wlan-applications-p179

prankousky commented 4 years ago

@prankousky The Eurotronic Spirit thermostates sem to have different hard- and software. Some were really irreliably so I send them back to Voelkner (seller) and exchanged them by a different (newer?) model.

I had done this before. One of these thermostats would always stop working. I exchanged it for a new one. It worked for a while, but now has the same issue. Which is something I really can't wrap my head around. It is always in the same place -obviously-, and there are not firmware updates I can manually install, so why would it suddenly stop working again?

Installing a decent quality 9dB aerial on the coordinator seems to have helped, but has still not eliminated the problem.

I have a similar antenna hooked up to my device, and had so from the beginning. Another thing that's weird is that when I did the switch from CC2531 / CC2530s, the new coordinator worked perfectly. Nothing went wrong. Then after a few days there were minor issues, and now I got the same problem I had before - devices more or less constantly disconnecting without an automatic way to fix them.

Two things work

  1. stop zigbee2mqtt
    • unplug coordinator
    • plug coordinator back in
    • start zigbee2mqtt
    • wait for quite a while

This will help if -for some reason I cannot trace- the entire zigbee2mqtt network is down.

  1. allow joining new devices (if not already on)
    • power off device (in this case, mostly thermostat) by removing battery
    • re-inserting battery
    • turning device back on
    • in few cases, factory-resetting thermostat and re-pairing it

Both solutions are no "real" solutions imho, as I cannot automate them and therefore have to be on location to fix them.

I have tried remotely rebooting my home server; this will not work!! I assume the coordinator (which is connected via USB) does not power off completely during reboot, and therefore whatever caused the problem still remains.

dzungpv commented 4 years ago

@prankousky see my answer here: https://github.com/Koenkk/zigbee2mqtt/issues/2813#issuecomment-578269750 For the large network, best solution for you is use all cc2652/cc1352 coordinator and router, running latest Z-stack firmware. I have problem with cc2530/2531 router, even with router running Z-Stack 3.0.x

prankousky commented 4 years ago

@dzungpv I have a Texas Instruments CC1352P-2 already, but ordered a Texas Instruments CC26X2R1 last night to see whether it makes a difference. I initially ordered the cc1532 because I thought being able to add an external antenna would be a benefit.

So I could use the cc2652 as coordinator and cc1352 as router? I thought I could only use one of those... if I use the cc2652 downstairs (coordinator) and cc1352 upstairs (router), everything must be covered, right?

I had the cc1352 as coordinator and multiple cc2530 and cc2531 as routers in a test run. That did not work at all! Devices close to the routers would always give me troubles.

About the settings though: I wrote yesterday that my lights would now respond within 15 - 25 seconds. While this was the case, this seems to not be related to the changes in my configuration. Instead, it was always due to the restart of zigbee2mqtt; last night (a few hours after the last restart) and this morning, the lag would be incredibly high again. It seems like the longer I run zigbee2mqtt, the worse the lag gets. And even a restart will only reduce it, not get rid off it all together.

Could you please tell me how to flash either the c2652 or cc1352 as a router? On the supported devices page there are only links to coordinator firmware.

dzungpv commented 4 years ago

@prankousky i have built router for cc1352p2 and cc2652, i will publish it very soon. Before, i use cc2531,2530, 2530+2592 router but it never stable. Now i am running router with cc2652 and coordinator is cc1352p2, it very stable. Your problem maybe cause about packet route through many hop (router), if you have a cc2531, flash the sniffer firmware and use WireShark to capture packet and post here, it help debug.

kisseler commented 4 years ago

That's interesting. I also have ordered both cc1352p2 and cc2652 device. I use cc1352 as coordinator right now. I have around two cc2530 router each floor. So there is a chance to replace most of them by positioning one cc2652 router on a central place?

Can anyone confirm the issue of using ZB 1.2 instead of 3.0 that might cause devices not reconnecting by themselves?

And in one linked post it's written that ZB 3.0 was disabled by @Koenkk to include Xiaomi devices to work. So when I only use 3.0 firmware, I would not be able to connect my Xiaomi devices anymore, is that correct?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

to4ko commented 3 years ago

@kisseler @prankousky guys were you able to find a solution? i have almost the same situation plus my xiaomi plugs got self-announced from time to time with relay clicking....currently using CC2538+CC2592 and have 1352P-2 and 2652R as well...

kashesandr commented 3 years ago

Same for me. Sometimes some devices loose the connection for no reason. E.g. lonsonho tuya light switch.

prankousky commented 3 years ago

@dzungpv is your firmware for the LAUNCHXL-CC26X21 ready for production and compatible with Koenkk's firmware (that I already run on a LAUNCHXL-CC1352P-2)?

@to4ko my solution with this is always just temporary. Most things work fine, but some (especially the Eurotronic thermostats, but recently also Xiaomi sensors) keep occasionally causing issues.

kennymc-c commented 3 years ago

I noticed that also the quality of the batteries has an influence. Until recently, I still had the supplied batteries in the sensors that caused the problems. I replaced them with Varta industrial versions and have not seen any problems since

kisseler commented 3 years ago

I just reset the whole system respecting the following article: https://www.metageek.com/training/resources/zigbee-wifi-coexistence.html

My zigbee channel now is 25 and my wi-fi runs on 1 or 6. It seems that also contributed a bit to stabilize the network.

tabnul commented 3 years ago

I have comparable issues, see https://github.com/Koenkk/zigbee2mqtt/issues/4439 zigbee network is channel 21, wifi channel 1. most outlets are running within 1 meter or even a few centimeters between each other (in the same room). I dont think wifi interference is the issue here.

tabnul commented 3 years ago

After reading your posts is started looking at my link quality and it looks like that some routers have very poor range directly to the coordinator. Especially some of the routers that i saw dropping off the network. However those routers are in the same room as other routers. They have 200+ link quality with neighbour routers. Those neighbour routers have respectable link quality with the coordinator.

So basically, the mesh itself is healthy, but router/outlet > coordinator is not always healthy. Could this cause them to drop off sometimes? Doesnt make sense to me..

search continues.

kisseler commented 3 years ago

Just for curiosity: Did anyone find a way to stabilize the network containing battery powered devices? I'm currently having issues especially with Aqara Cube remote, Heiman remote. This is extra annoying when using a remote requires re-pairing beforehand.

May that be a problem of changing repeater and not self-healing ability of the remotes? Any could it be a solution to establish several different Zigbee networks instead of one big one using repeaters all over the house?

Thanks for your support!

tabnul commented 3 years ago

Just for curiosity: Did anyone find a way to stabilize the network containing battery powered devices? I'm currently having issues especially with Aqara Cube remote, Heiman remote. This is extra annoying when using a remote requires re-pairing beforehand.

May that be a problem of changing repeater and not self-healing ability of the remotes? Any could it be a solution to establish several different Zigbee networks instead of one big one using repeaters all over the house?

Thanks for your support!

My network is stable now.

Regarding the battery powered devices:

jcbagtas commented 2 years ago

Just for curiosity: Did anyone find a way to stabilize the network containing battery powered devices? I'm currently having issues especially with Aqara Cube remote, Heiman remote. This is extra annoying when using a remote requires re-pairing beforehand. May that be a problem of changing repeater and not self-healing ability of the remotes? Any could it be a solution to establish several different Zigbee networks instead of one big one using repeaters all over the house? Thanks for your support!

My network is stable now.

  • At the end WIFI interference was one of the issue. Even with the correct channel. I moved the WIFI router a few meters. > fixed
  • Last weekend all the sudden i had big issues again; network was unusable! It turned out that plugging in a USB3 harddrive in (2 meters from the zigbee adapter on another server!!) had major impact! usb3 works om the same frequency as zigbee/wifi 2.4 ghz

Regarding the battery powered devices:

  • swapping the default brandless/panasonic batteries that come with xiaomi devices fixed the repairing / falling off the network issue. I replaced with decent duracell batteries...

Interesting how Zigbee can be affected by "common" electrical appliances that we have.

I have my CC2351 dongle hanging from the ceiling at the middle of the living room, nothing close by but the network is still unstable. I replaced all my Sonoff sensors with new coin batteries (cheap, but new) they still disconnect.

One scenario that I want to see is that the power from the RPi Plug is insufficient to hold the dongle - which is currently connected to a 3m USB3 Data Cable.

I gave up on the issue a while back, I stopped debugging the thing but left them on. The next day, everything works well. Until today I had to shutdown my server and restart it. The issue is back. I will check tomorrow if all automations will just magically fix themselves again.

If this is the case, the Zigbee Dongle is really unstable.

I might try and purchase a zigbee hub from sonoff just to test if the problems will end.

tabnul commented 2 years ago

@jcbagtas I see you use the CC2531. To be honest my experience with that adapter was not super. Its not recommended anymore also.

After i switched to the CC1352P-2 it went WAY better. Also i found that a decent network coverage (large mesh) helps as well.