Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.08k stars 1.68k forks source link

ERROR: Failed to execute LQI for 'Coordinator' #5301

Closed joshuakoh1 closed 3 years ago

joshuakoh1 commented 3 years ago

What happened

Works for half a day then stops receiving updates, devices dropping off (I'm using IKEA TRADFRIs to extend range and all of them drop off so only the devices directly connected to the coordinator send updates).

Eventually even the coordinator drops off. Network map scan shows that coordinator is down as well.

ERROR: Failed to execute LQI for 'Coordinator'
... expectedly everything else fails to execute LQI too

Restarted zigbee2mqtt and connection timeout

Dec 14 21:32:30 OpenHAB3 npm[57170]: Zigbee2MQTT:info  2020-12-14 21:32:30: Starting Zigbee2MQTT version 1.16.2 (commit #0514204)
Dec 14 21:32:30 OpenHAB3 npm[57170]: Zigbee2MQTT:info  2020-12-14 21:32:30: Starting zigbee-herdsman (0.13.11)
Dec 14 21:32:50 OpenHAB3 npm[57170]: Zigbee2MQTT:error 2020-12-14 21:32:50: Error while starting zigbee-herdsman
Dec 14 21:32:50 OpenHAB3 npm[57170]: Zigbee2MQTT:error 2020-12-14 21:32:50: Failed to start zigbee
Dec 14 21:32:50 OpenHAB3 npm[57170]: Zigbee2MQTT:error 2020-12-14 21:32:50: Exiting...
Dec 14 21:32:50 OpenHAB3 npm[57170]: Zigbee2MQTT:error 2020-12-14 21:32:50: Error: Failed to connect to the adapter (Error: SRSP - SYS - ping after 6000ms)
Dec 14 21:32:50 OpenHAB3 npm[57170]:     at ZStackAdapter.<anonymous> (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:92:31)
Dec 14 21:32:50 OpenHAB3 npm[57170]:     at Generator.throw (<anonymous>)
Dec 14 21:32:50 OpenHAB3 npm[57170]:     at rejected (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:25:65)

What did you expect to happen

How to reproduce it (minimal and precise)

Debug info

Zigbee2MQTT version: 1.16.2, herdsman 0.13.11 Adapter hardware: CC1352P-2 Adapter firmware version: ZStack3 1026 firmware

kevincaradant commented 3 years ago

Hi

Did you see the FAQ ? https://www.zigbee2mqtt.io/information/FAQ.html

CC1352P-2/CC26X2R1 launchpad coordinators only: press the reset button on the device #
If Zigbee2MQTT fails to start with a CC1352P-2 with Error: SRSP - SYS - version after 6000ms, you most probably have connected your device to a system that requires pressing the reset button (the one next to the USB connector) momentarily/shortly after connecting the USB cable. This issue has primarily been observed on x86 architectures only (e.g., Intel NUC, HPE Microserver, i7 laptop), see also #2162. The procedure has to be repeated every time the CC1352P-2 is re-connected and it’s not clear yet, whether this can be fixed at all. It does not seem to occur on ARM based boards (Raspberry Pi, ODROID XU4).

Something that can also solve the issue is to replug the USB cable.

And if you're running on Linux distribution did you remove modemmanager ?

ModemManager, which is default installed on e.g. Ubuntu, is known to cause problems. It can easily be fixed by removing ModemManager through sudo apt-get purge modemmanager.
joshuakoh1 commented 3 years ago

@kevincaradant I have and I have reset it by pressing the button beside the USB port multiple times before this.

I did not remove modemmanager. I will give it a try.

kevincaradant commented 3 years ago

@joshuakoh1 If you have modemmanager, it's 100% sure that will be bad for the state of the coordinator.

Mine didn't work more than 5 minutes with it :)

Otherwise, wait a better advice ;)

Edit: try to remove it from the usb port and then replug it.

lucipacurar commented 3 years ago

I have the same problem on:

Zigbee2MQTT:info  2020-12-14 22:53:47: Starting Zigbee2MQTT version 1.16.2 (commit #04c15f7)
Zigbee2MQTT:info  2020-12-14 22:53:47: Starting zigbee-herdsman (0.13.37)
Zigbee2MQTT:info  2020-12-14 22:53:49: zigbee-herdsman started
Zigbee2MQTT:info  2020-12-14 22:53:49: Coordinator firmware version: '{"meta":{"maintrel":3,"majorrel":2,"minorrel":6,"product":0,"revision":20190608,"transportrev":2},"type":"zStack12"}'

I have the same IKEA hardware

Zigbee2MQTT:info  2020-12-14 22:53:49: 0x680ae2fffef2f903 (0x680ae2fffef2f903): E1603/E1702 - IKEA TRADFRI control outlet (Router)
Zigbee2MQTT:info  2020-12-14 22:53:49: 0x680ae2fffe9c84b0 (0x680ae2fffe9c84b0): E1746 - IKEA TRADFRI signal repeater (Router)

The problems appeared yesterday when I moved my setup from SD Card to SSD and Docker installed more recent images of zigbee2mqtt, mosquitto, and HA.

lucipacurar commented 3 years ago

I found the version of zigbee2mqtt that works on the SD card:

Zigbee2MQTT:info  2020-12-15 10:58:51: Starting Zigbee2MQTT version 1.14.3 (commit #f8066e8)
Zigbee2MQTT:info  2020-12-15 10:58:51: Starting zigbee-herdsman...
Zigbee2MQTT:info  2020-12-15 10:58:53: zigbee-herdsman started
Zigbee2MQTT:info  2020-12-15 10:58:53: Coordinator firmware version: '{"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}}'

I'm going to install it on the SSD as well

lucipacurar commented 3 years ago

On 1.14.3 the Ikea routers seem to be unresponsive but I get messages from the devices connected to them. Now I'm trying 1.15.0 but the new web frontend doesn't load or show anything. I'm using zigbee2mqtt-assistant to get info and the logs.

lucipacurar commented 3 years ago

I've been struggling for the past 2 days to make it work. I simply can't get it to work. What happened in between was: moving the OS from SD Card to USB SSD drive, and update Z2M to the latest version. All the hardware is in the same position. I'm kinda giving up.

kevincaradant commented 3 years ago

@lucipacurar are you using a raspberry pi with Z2M and the coordinator ?

Did you connect the coordinator on a USB 2.0 or at least with a USB Cable ? (some users reported issues with raspi about using a SSD and connect the coordinator directly on an USB port. Issue was related about some interferences I believe)

Otherwise, did you try to revert on your previous version of Z2M also ?

joshuakoh1 commented 3 years ago

@kevincaradant There was no modemmanager on my machine.

Currently running Z2M on a VM with USB passthrough so plugging out/reconnecting the stick is not really an option without forcing a reboot of the VM.

Anyway, I reverted to 1.15 and did a reboot and it's been stable for a full day now. Fingers crossed.

lucipacurar commented 3 years ago

@lucipacurar yes, on the same RPi 4, 4GB, I have the CC2531 on USB 2.0 (no cable, but I will buy 1.8m one in the morning), a low power SSD on USB 3.0, everything powered by the official RPi power adapter. Even with 1.14.3 I can't get it to work on SSD. On the SD Card it worked for months with 1.14.3, but somehow I can't get it to work again :( It's weird that when the routers fail, the sensors still send data to the coordinator. And after some time the sensors won't even pair again and the whole process doesn't even show up in the logs at this point.

I tried repairing all devices, it worked for a while until it didn't work again.

LE: The only thing left is radio interference. But I don't understand how it worked so many months. I just moved one router closer to the coordinator, it connects and I also paired one more temperature sensor. If these stay up for some time, then interference it is.

kevincaradant commented 3 years ago

@lucipacurar, I think the interférence is due to your new ssd. Many people has reported to have weird and incomprehensible issues with the CC2531 using a ssd with USB 3.0 with it. I'm not with this scenario so I didn't search more about it. Just wanted to share this possibility with you ;)

The only "fix" was to add usb cable between the CC2531 and the USB port of your raspebrry pi. Hopping that will fix your issue otherwise no idea, I'm not an expert with all of this ;)

@joshuakoh1, alright, good luck :D

lucipacurar commented 3 years ago

@kevincaradant I moved the adapter away from the RPi using a 1.8m cable. Now the networks started to work. But there's one more thing, one router is behind my TV, AV Receiver, and near a switch. It's like 5m between the coordinator and the router, and it's like a straight line of sight between them, the only thing in their way being an Ikea wooden TV stand an a large speaker, and the signal quickly goes to 0 between them. But the router connects to the router at the floor above through a solid concrete ceiling :) I moved it to another location, out of line of sight, 12.5cm brick wall in between, and extra 2-3m, and the signal is better but not by much. I wonder how it all worked for like 5 months until now.

kevincaradant commented 3 years ago

@kevincaradant I moved the adapter away from the RPi using a 1.8m cable. Now the networks started to work. But there's one more thing, one router is behind my TV, AV Receiver, and near a switch. It's like 5m between the coordinator and the router, and it's like a straight line of sight between them, the only thing in their way being an Ikea wooden TV stand an a large speaker, and the signal quickly goes to 0 between them. But the router connects to the router at the floor above through a solid concrete ceiling :) I moved it to another location, out of line of sight, 12.5cm brick wall in between, and extra 2-3m, and the signal is better but not by much. I wonder how it all worked for like 5 months until now.

You're talking about the LQI value ? It's a little bit difficult to understand this value in some cases but you have the quality link between the end-devices => coordinator which can be low and sometimes to 0 (direct link) Then you have the LQI between the end-device => router => coordinator where the LQI between end-device => router is good as well the LQI router => coordinator. So at the end, the automations works well because you ahve a router which help to propagate the signal. That could explain why it's works well even if the LQI showed could be low because on my side, clearly, the reported value is always the direct link between the end-device => coordinator whi is wrong because in my real world, I have some routers to help :) Don't know if it's your case but it's look like it is.

lucipacurar commented 3 years ago

@kevincaradant I was looking at the map generated by the web UI and it looked like the ground level floor router didn't connect directly to the coordinator, but it used the router at the floor above through a concrete ceiling. This looks odd to me as long as there is a straight line of sight between the ground level floor router and the coordinator, and they're 5m away. Maybe I'm not properly understanding the map :)

kevincaradant commented 3 years ago

If you add your screenshot of your network map maybe we can see what you have.

This is an example of the generated map (I truncked it because otherwise it will be too difficult to see the 80 devices on the map :) )

image

As you can see, I have two routers which decided to connect them together router_bathroom and router_pc_desktop (links red) and them there also connected directly to the coordinator (links green)

You seems surprised that your router more close of the coordinator is connected with the router on the 1st floor but I think you take the problem in the wrong way. I think it's more the router on the 1st floor which is more far away of the coordinator and by consequence, it decided to connect in the first way on your router on the main floor, like that your router on the main floor will be itself connected on the coordinator :)

You have often 2 links as you can see on my graph.

Hoping what I mention is right but it make sense I think :D

NB: That's why I told often to people that the LQI value reported is often wrong because in Home Assistant for example, I have often the 'green link' which is reported as value ;). Not every time but I already noticed it for my weak link notified

lucipacurar commented 3 years ago
Screenshot 2020-12-19 at 11 31 07

Here's my whole network. I'm far from your 80 devices. Although after some more testing I plan to add more devices.

kevincaradant commented 3 years ago

alright :)

It's a little bit difficult for me to understand the repeater / power_outlet_living and temp_garden because there are too close on the graph.

To be honest I don't understand it between the 138 and 183 LQI values for power_outlet_lobby and power_outlet_living ... but at least we see that the signal is really similar between power_outlet_lobby => power_outlet_living => repeater => coordinator and the direct link power_outlet_lobby => coordinator :)

But for example what I understand, power_outlet_living can't join directly the coordinator so 0 LQI but power_outlet_living using the repeater can join the coordinator allow it to report values without any difficulties :)

It's like that I understand this graph, but I prefer the map in HA, more easy to read it :)

lucipacurar commented 3 years ago

Closest devices to the coordinator are the power_outlet_living, which is on the same floor, temp_living same floor again, power_outlet_lobby in the upper floor. temp_living is 1-2m away from power_outlet_living and probably a little closer to the coordinator. The devices which are farthest away from the coordinator are the repeater and temp_garden. The repeater is at the opposite corner of the house compared to the coordinator.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

Roaders commented 2 years ago

I had the same issue with an Ikea repeater and several Aqara devices... Turns out that the power outlet you plug the Ikea repeater into needs to be turned on! I know, who knew!! 😉