Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.36k stars 1.69k forks source link

"Adapter disconnected, stopping" (Skyconnect + Multiprotocol + EZSP) #21198

Closed merlinpimpim closed 2 months ago

merlinpimpim commented 10 months ago

What happened?

I regularly get this type of message with my adapter: "Adapter disconnected, stopping". In this case, Z2M restarts.

(Continuation of the conversation started here: https://github.com/Koenkk/zigbee2mqtt/discussions/21140#discussioncomment-8340955)

What did you expect to happen?

Correct the problem by finding out why it happens

How to reproduce it (minimal and precise)

Nothing, just wait and read the logs

Zigbee2MQTT version

1.35.2

Adapter firmware version

7.3.1.0 build 0

Adapter

Skyconnect (Multiprotocol)

Setup

HA supervised, Khadas Vim1S (Docker)

Debug log

logs.txt

merlinpimpim commented 10 months ago

To continue the conversation started here @Nerivec : https://github.com/Koenkk/zigbee2mqtt/discussions/21140#discussioncomment-8340955:

I have provided a herdsman debug log but it is very difficult to get it just before a crash which can happen at any time. In this example the crash occurred 4 or 5 minutes after the logs were extracted (and then a Z2M restarts so a new Z2M container ID...).

I have no other error in the logs (in the Z2M interface) when this happens. See the following screenshot. In this same screenshot we see examples of how often the problem occurs: chrome_RFUV0PG4bL

It is therefore not certain that the problem is due to the suspension of the USB ports.

However, the cat /sys/module/usbcore/parameters/autosuspend command gives the value 2 (not -1). And I haven't figured out how to change it permanently on my system (Khadas Vim1S, Fenix/Debian) including reading this: https://unix.stackexchange.com/questions/91027/how-to-disable-usb-autosuspend-on-kernel-3-7-10-or-above (I'm not a Linux system expert...)

In any case, the echo -1 >/sys/module/usbcore/parameters/autosuspend command is taken into account (until the next reboot), but it doesn't change the problem.

I hope I've given you enough information to help me.

Nerivec commented 10 months ago

You are using silabs multiprotocol addon, right? If so, your firmware version shouldn't be 7.x.x, multiprotocol RCP firmware are currently all 4.x.x.

https://github.com/NabuCasa/silabs-firmware/tree/main/RCPMultiPAN https://github.com/darkxst/silabs-firmware-builder/tree/main/firmware_builds/skyconnect

Also make sur the settings in silabs match whatever firmware you pick (baudrate). You'll probably want to go with a 460800 one (but older nabu casa ones are 115200).

merlinpimpim commented 10 months ago

You're right because I did some manipulations in this direction just before your message and I think there's a clue, but I don't understand it any more:

I've just used https://skyconnect.home-assistant.io/firmware-update/ to flash this firmware: https://github.com/darkxst/silabs-firmware-builder/blob/main/firmware_builds/skyconnect/rcp-uart-802154-v4.4.0-skyconnect-460800.gbl (probably the same as https://github.com/NabuCasa/silabs-firmware/blob/main/RCPMultiPAN/beta/NabuCasa_SkyConnect_RCP_v4.4.0_rcp-uart-hw-802154_460800.gbl) ?

Next: I manage to start SLM in 460800 baud but in this case Z2M does not start any more including by changing my current parameters:

port: tcp://core-silabs-multiprotocol:9999
adapter: ezsp
baudrate: 115200

to:

port: tcp://core-silabs-multiprotocol:9999
adapter: ezsp
baudrate: 460800

So I put 115200 back into SLM and Z2M and everything starts fine, but in Z2M/settings/About I read "Coordinator revision":7.3.1.0 build 0

And in this configuration I always get "Adapter disconnected, stopping" error messages (obviously...)

How can I see a version 7.3.1.0 in Z2M after flashing an 4.4.0 RCP version? I'm lost....

Nerivec commented 10 months ago

Did you make sure to disable anything that could be using the adapter before starting the flashing procedure? (ZHA, you shouldn't have at all in your case), silabs multiprotocol addon and Z2M. I assume you didn't get any particular error from the flasher? (I've never used it since I only have a Sonoff Dongle-E).

PS: If you are not using the multiprotocol (other radios than Zigbee), I'd recommend you use an NCP firmware, with just Z2M. SLM/Zigbeed adds a layer of complexity and no major benefit if you don't have a huge network of devices.

merlinpimpim commented 10 months ago

I flashed the adapter from my PC, using the website mentioned above. So it wasn't connected to my Khadas with Home Assistant. And I didn't get any errors during the operation. When I inserted the key a second time, the flasher confirmed that I had the expected version.

Yes, I was thinking of going back to the NCP firmware to test, but we agree that if I do that, I'll have to re-pair my 25 devices one by one (bearing in mind that some of my Zigbee relays are in wall switch boxes, in an outdoor electric gate post, etc...) ? :((

Could there be another explanation for the fact that Z2M "sees" a 7.3.1 version when I'm on RCP 4.4.0 (an issue with a file in the /zigbee2mqtt directory of the Z2M addon in HA, or something else...)?

Nerivec commented 10 months ago

If the flasher is telling you 4.4.0 after flashing, then you're good. As for Z2M, the version would actually be reported by the silabs addon, that itself gets it from Zigbeed, not the adapter (since with multiprotocol the Zigbee stack is offloaded to the host). _I seemed to remember the SDK version being reported when I tested the addon a while back, but maybe I was wrong and it was the EmberZNet version instead, or maybe they changed that internally (bit of a mess really)._

I looked, apparently Home Assistant reverted the version of the multiprotocol addon SDK back to 4.3.1 (again), which would report 7.3.1 EmberZNet (what you're seeing). And might explain a lot of you troubles. They created "special firmware" too... https://github.com/home-assistant/addons/tree/master/silabs-multiprotocol/rootfs/root See https://github.com/home-assistant/addons/pull/3429

On the "re-pair" question, when I tested the addon a few months back, I had to re-pair everything when I switched from NCP to RCP (which also upgraded the EZSP version), but not when I switched back to NCP at the same version level as the RCP (the equivalent, as described in first link...), Z2M kept all my devices talking without issue... But I heard mixed reports on people who also tried, so... 😢

merlinpimpim commented 10 months ago

I installed the firmware you were talking about (4.3.1), I managed to start SLM (2.4.4) at 460800 baud and Z2M at 115200 baud 😊 but unfortunately I still get the famous error message 😞 image

For info on Z2M: image

Nerivec commented 10 months ago

Can you try to catch the logs like you did before? There was a lot of "noise" in your previous one because of decryption failures (I think?). Hopefully these will now be gone.

PS: The Z2M baudrate won't matter in your current setup, only the SLM one will be used (since Z2M is connecting through socket to SLM).

merlinpimpim commented 10 months ago

Here's an updated file: logs9.txt A crash occurred 4 minutes after this extraction.

About the Z2M baudrate, I believe you but if I don't put these parameters in the Z2M add-on configuration, it won't start:

adapting: ezsp
baudrate: 115200
romarysonrier commented 10 months ago

I am facing a similar issue, more details here : https://github.com/Koenkk/zigbee-herdsman/issues/910

romarysonrier commented 10 months ago

Here is some log that lead to shutdown of zigbee2mqtt : https://github.com/Koenkk/zigbee-herdsman/files/14170047/lixee-2.log

romarysonrier commented 10 months ago

@Nerivec feel free to ask more logs.

merlinpimpim commented 10 months ago

Ah, finally a possible lead :) And I get the impression that this phenomenon is a very French issue ;) Thank you @romarysonrier

romarysonrier commented 10 months ago

@merlinpimpim indeed , a tiny lead. is Lixee the cause or the consequence? that is question. It seems that it is related to our specific Lixee device using EZSP, which does a lot a polling on many values.

@Nerivec looking at my logs, we see a kind of pattern, having the same sequence :

first , we do have this log line: zlinky config: standard, xxxx_phase, false, yyyyy from src/devices/lixee.ts , generate by "getCurrentConfig(device, options)" line 1044.

then, we have endpoint Read 0x00158d0005d25e8c/1 liXeePrivate(["currentDate","currentIndexTarif","currentPrice","currentTarif"] .....

then , few millesecondes after we have some error log : Unexpected packet sequence X | X+1

followed by Unhandled frame childJoinHandler

hope its helps.

jhbruhn commented 10 months ago

I can confirm this behaviour with my HA Yellow, Multiprotocol firmware and Z2M 1.35.2, with 1.35.1 it works. Not Lixee device in my network. But I would have to get more precise log messages to find out where exactly the error happens, the only error I have for now is "Adapter disconnected"

romarysonrier commented 10 months ago

@jhbruhn could your HW and exact FW revision? and share some log with zigbee-herdsman debugging enable ? by running DEBUG=zigbee-herdsman* npm start. Could you also list the router/endpoint device running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.

romarysonrier commented 10 months ago

@merlinpimpim Could you also list the router/endpoint devices running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.

romarysonrier commented 10 months ago

@Nerivec even shorter log : lixee-3.log

Also, among the seveal test case, most "Unexpected packet sequence" occurence seen are related to ACK

romarysonrier commented 10 months ago

Disabling polling on Linky_TIC / Lixee by setting an empty string on param tic_command_whitelist is a working dirty workarround that stop unexpected shutdown of Z2M.

tic_command_whitelist: ~

But by doing so we loose the non Reportable attribute values, as described here : https://github.com/fairecasoimeme/Zlinky_TIC

EDIT : empty tic_command_whitelist disable also Reportable attributes, so it is not a evidence for any rootcause.

romarysonrier commented 10 months ago

Looking at this sequence : image

At first, a data is sent : image and was expecting a ACK : -?- waiting (2)

But Z2M never received this ACK (2) and later while receiving uart <-- ACK (1): we get a Unexpected packet sequence 1 | 2 +2ms

@Koenkk @Nerivec is this expected ?

Full log here : lixee-4.log

jhbruhn commented 10 months ago

The Coordinator Type is

EZSP v12

to be specific a HA Yellow running the most recent (Gecko SDK 4.3.1) Multiprotocol firmware.

Coordinator revision is

7.3.1.0 build 0

Logs are attached here: logs.txt

(Unfortunately it seems like HA truncated the logs at the start)

The devices in my network are:

info  2024-02-06 19:45:47: Starting Zigbee2MQTT version 1.35.2 (commit #unknown)
info  2024-02-06 19:45:47: Starting zigbee-herdsman (0.33.5)
info  2024-02-06 19:45:53: zigbee-herdsman started (resumed)
info  2024-02-06 19:45:53: Coordinator firmware version: '{"meta":{"maintrel":"1 ","majorrel":"7","minorrel":"3","product":12,"revision":"7.3.1.0 build 0"},"type":"EZSP v12"}'
info  2024-02-06 19:45:53: Currently 69 devices are joined:
info  2024-02-06 19:45:53: Wohnzimmer Regallampe (0xcc86ecfffe19d137): LED1733G7 - IKEA TRADFRI LED bulb E14 600 lumen, dimmable, white spectrum, opal white (Router)
info  2024-02-06 19:45:53: Büroja Deckenlampe (0xf0d1b8000014287b): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Büroja Schreibtischlampe (0x001788010b22487a): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Büroja Heizung (0xcc86ecfffec1f63f): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Büroja Fenster links (0x00124b002521b10d): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Büroja Fenster rechts (0x00124b00254e0b3a): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Büroja Kaffeeknopf (0x04cd15fffe4a6863): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Stehlampe (0x001788010cae07a2): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Wohnzimmer Deckenlampe (0xf0d1b8000013cb3f): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Wohnzimmer Kugellampe (0x680ae2fffe557254): LED1937T5_E27 - IKEA TRADFRI LED bulb E27 470 lumen, dimmable, opal white (Router)
info  2024-02-06 19:45:53: Wohnzimmer STYRBAR (0x50325ffffe2ac628): E2001/E2002 - IKEA STYRBAR remote control (EndDevice)
info  2024-02-06 19:45:53: Sterni (0x50325ffffef5205f): LED2003G10 - IKEA TRADFRI LED bulb E26/27 1100/1055/1160 lumen, dimmable, white spectrum, opal white (Router)
info  2024-02-06 19:45:53: Wohnzimmer Heizung (0xcc86ecfffec87edd): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Fenster (0x00124b00234e98ef): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer SYMFONISK (0x943469fffe65989c): E1744 - IKEA SYMFONISK sound controller (EndDevice)
info  2024-02-06 19:45:53: Lichterkette (0xcc86ecfffe89ee37): E1603/E1702/E1708 - IKEA TRADFRI control outlet (Router)
info  2024-02-06 19:45:53: Flur Deckenlampe (0xf0d1b8000013ad20): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Bewegungsmelder (0x00124b002502f639): SNZB-03 - SONOFF Motion sensor (EndDevice)
info  2024-02-06 19:45:53: Bürojo Heizung (0xcc86ecfffec39bb9): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Bürojo Lichtknopf (0x003c84fffe259547): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Bürojo Fenster (0x00124b002518a8cc): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Lichtleiste (0xb4e3f9fffea55396): ICPSHC24-10EU-IL-1 - IKEA TRADFRI driver for wireless control (10 watt) (Router)
info  2024-02-06 19:45:53: Schlafzimmer Lichtschalter (0x04cd15fffe0f9437): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Vorhangschalter (0x94deb8fffe0f3a98): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Heizung links (0xcc86ecfffef4d241): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Fenster links (0x00124b002521b14d): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Fenster rechts (0x00124b00251d357c): SNZB-04 - SONOFF Contact sensor (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Heizung rechts (0x847127fffe0a5cb4): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Badezimmer Heizung (0xcc86ecfffec20a8a): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Küche Deckenlampe (0xf0d1b8000013cb94): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Küche Esstisch Hängelampe 1 (0x001788010ba70b7e): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Küche Esstisch Hängelampe 2 (0x001788010b1f7e51): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Küche Lichtleiste (0x001788010b4783f2): 8718699703424 - Philips Hue white and color ambiance LightStrip plus (Router)
info  2024-02-06 19:45:53: Küche Wandlampe (0xf0d1b8000013cc06): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Küche Heizung (0xcc86ecfffef4d631): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info  2024-02-06 19:45:53: Küche STYRBAR (0x50325ffffe2ac673): E2001/E2002 - IKEA STYRBAR remote control (EndDevice)
info  2024-02-06 19:45:53: Küche SYMFONISK (0x943469fffe5fbe4a): E1744 - IKEA SYMFONISK sound controller (EndDevice)
info  2024-02-06 19:45:53: Küche Lichtknopf (0x2c1165fffecfb8c1): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Küche Esstischschalter (0x0017880109a9ee05): 929002398602 - Philips Hue dimmer switch (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Deckenlampe (0xf0d1b8000014c5f1): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info  2024-02-06 19:45:53: Sudokuknopf (0x70ac08fffedcf453): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 1 (0x001788010cac1700): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 2 (0x001788010cad2b03): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 3 (0x001788010cae0767): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Arbeitslampe Direkt (0x2c1165fffe85ef45): LED2003G10 - IKEA TRADFRI LED bulb E26/27 1100/1055/1160 lumen, dimmable, white spectrum, opal white (Router)
info  2024-02-06 19:45:53: Büroja Schreibtisch (0xa4c1387eb0418c52): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Wohnzimmer Subwoofer (0xa4c1382197f8815d): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Schlafzimmer Lichterschalter (0x142d41fffe3b900e): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info  2024-02-06 19:45:53: Flur Trockner (0xa4c138a4bc29642d): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Bürojo Schreibtisch (0xa4c1385540f20e2b): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Flur Waschmaschine (0xa4c138d87a63c833): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info  2024-02-06 19:45:53: Wohnzimmer Kuglibus (0x001788010bfbac59): 8719514301481 - Philips Hue Filament Globe Ambiance E27 (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 1 (0x001788010cac19ed): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 2 (0x001788010cad16c7): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 3 (0x001788010b9aa2c6): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info  2024-02-06 19:45:53: Bürojo Bastelleuchte Direkt (0x50325ffffee76dff): LED2004G8 - IKEA TRADFRI LED bulb E26/E27 800/806 lumen, dimmable, white spectrum, clear (Router)
info  2024-02-06 19:45:53: Bürojo Schalter (0x6c5cb1fffe2c29ac): E2201 - IKEA RODRET wireless dimmer/power switch (EndDevice)
info  2024-02-06 19:45:53: Büroja Thermometer (0xa4c13884a645bced): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Küche Thermometer (0xa4c1381a2f1411d1): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Thermometer (0xa4c1387ac22fdbea): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Bürojo Thermometer (0xa4c1380cb5161457): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Filament Thermometer (0xa4c13829ce9c4039): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Flur Thermometer (0xa4c138f649eca602): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Schlafzimmer Thermometer (0xa4c1389bd33d2b4d): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Draußen Thermometer (0xa4c138742dff2382): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Badezimmer Thermometer (0xa4c138bd9299f1b6): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Wohnzimmer Lichtleiste (0xf0d1b800001bd287): 4058075208339 - LEDVANCE Flex 3P multicolor (Router)
info  2024-02-06 19:45:53: Serverschrank Thermometer (0xa4c13866cfe09f7c): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info  2024-02-06 19:45:53: Flur Bewegungsmelder (0xd87a3bfffe015855): E2134 - IKEA VALLHORN wireless motion sensor (EndDevice)
merlinpimpim commented 10 months ago

@merlinpimpim Could you also list the router/endpoint devices running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.

Hello,

Here are (all) my devices:

Aqara   MCCGQ11LM
Aqara   DJT11LM
LiXee   ZLinky_TIC
Neo NAS-AB06B2
Neo NAS-AB02B2
NodOn   SIN-4-1-20
Nous    E2
Nous    A1Z
Nous    A1Z
Nous    A1Z
OSRAM   AB3257001NJ
SONOFF  ZBMINIL2
SONOFF  SNZB-04
SONOFF  SNZB-02D
SONOFF  SNZB-02D
SONOFF  SNZB-02D
SONOFF  SNZB-02D
TuYa    TS0601_soil
TuYa    TS0207_repeater
TuYa    TS0207_repeater
Xiaomi  WXKG01LM
Xiaomi  WSDCGQ01LM
Xiaomi  GZCGQ01LM

If it helps...

merlinpimpim commented 10 months ago

After checking, the common link between my devices and those of @jhbruhn (excluding Lixee) are SonOff SNZB-xxx devices, including SNZB-04s in particular.

Nerivec commented 10 months ago

I'm still looking into this, but my list of tasks is growing faster than my two eyes can follow 😅 I doubt Sonoff devices are the cause, I've got about 20 of them, I'd have caught the bug too!

I've fixed a couple of problems raised from your original log @merlinpimpim, but not particularly linked to LiXee...

@jhbruhn By any chance do you have any older logs from 1.35.1 or before? Older versions were doing internal restarts (without reaching Z2M level "Adapter disconnected"), that you might not have noticed, but they could still have been happening.

jhbruhn commented 10 months ago

Absolutely, I am running 1.35.1 right now and also noticed that devices sometimes do not react in that version. I will try to collect some logs about that behaviour the next time I notice it.

But the problems with 1.35.2 occur right after starting Z2M, making it unusable, that does not happen with 1.35.1. Maybe the problem is different than that?

Nerivec commented 10 months ago

Then likely different yes, and (without seen the error can't say for sure) there's a good chance that is fixed already. I'm just looking for a couple of other buggers before I create a PR.

merlinpimpim commented 10 months ago

Disabling polling on Linky_TIC / Lixee by setting an empty string on param tic_command_whitelist is a working dirty workarround that stop unexpected shutdown of Z2M.

tic_command_whitelist: ~

But by doing so we loose the non Reportable attribute values, as described here : https://github.com/fairecasoimeme/Zlinky_TIC

EDIT : empty tic_command_whitelist disable also Reportable attributes, so it is not a evidence for any rootcause.

@Nerivec : I think @romarysonrier is on to something because since I did what he said, Z2M hasn't rebooted for 2 and a half hours, which is a record since the 1.35.2 update (Z2M currently reboots between 1 and 8 times an hour for me...).

Parameter tic_command_whitelist changed from "all" to empty: image

The issue with this workaround is that no more attributes are exposed. So it's as if the device doesn't exist and/or serves no purpose: image

However, this shows that the issue is indeed with Lixee for our cases.

I'm going to leave it until tomorrow morning to confirm that it didn't restart overnight. To be continued.

Nerivec commented 10 months ago

Glancing at the github for that device, tomorrow (if leaving it like that worked) you might want to try enabling only the values that interest you in that (long) list. I believe that's it there in the first column, and see if that improves stability.

romarysonrier commented 10 months ago

On my side , I tried yesterday :

And then i did those testing cases :

Thus , I will retry test case B ( all attributes belonging to the same cluster = 0x702) for a longer time frame, to gather more log and clues. But is the lixee device still send updated to all Reportable attribute from all cluster (haElectricalMeasurement+haMeterIdentification+lXeePrivate+seMetering).

@Nerivec : So it so mean that the abnormal behaviour is also trigger when using only Reportable attribute update

merlinpimpim commented 10 months ago

My return today:

I updated Z2M to version 1.35.3 last night, leaving tic_command_whitelist empty: no crashes in 15 hours! 😀 So the culprit is clearly identified.

Now I've added some Reportable attributes (I'm in three-phase), I haven't restarted Z2M, but I've done a Reconfigure Device as shown in the screenshot: image

merlinpimpim commented 10 months ago
  • No reboot for 1 hour (I'll wait a few more hours to be sure)

no reboot for 3 hours :)

romarysonrier commented 10 months ago

@Nerivec i found out that zigbee-herdsman ezrp uart driver doesn't support NAK message properly.

Here is a proposal for handleDATA()

diff --git a/src/adapter/ezsp/driver/uart.ts b/src/adapter/ezsp/driver/uart.ts
index 1755a0e..f07b26d 100644
--- a/src/adapter/ezsp/driver/uart.ts
+++ b/src/adapter/ezsp/driver/uart.ts
@@ -44,6 +44,7 @@ export class SerialDriver extends EventEmitter {
     private sendSeq = 0; // next frame number to send
     private recvSeq = 0; // next frame number to receive
     private ackSeq = 0;  // next number after the last accepted frame
+    private rx_reject = false;
     private waitress: Waitress<EZSPPacket, EZSPPacketMatcher>;
     private queue: Queue;

@@ -210,6 +211,24 @@ export class SerialDriver extends EventEmitter {

         debug(`<-- DATA (${frmNum},${frame.control & 0x07},${reTx}): ${frame}`);

+        if(this.recvSeq != frmNum){
+            if(reTx || this.rx_reject){
+                debug(`Unexpected DATA packet sequence ${frmNum} | ${this.sendSeq} : rejecting packet`);
+            }
+            else {
+              debug(`Unexpected DATA packet sequence ${frmNum} | ${this.sendSeq} : Sending NAK & & rejecting packet`);
+              this.writer.sendNAK(this.recvSeq);
+              this.rx_reject=true;
+            }
+            return;
+        }
+        else{
+            if (!reTx && this.rx_reject){
+                 debug(`DATA packet sequence ${frmNum} : clear reject`);
+                 this.rx_reject=false;
+            }
+        }
+
         this.recvSeq = (frmNum + 1) & 7; // next

         debug(`--> ACK  (${this.recvSeq})`);
@@ -272,6 +291,7 @@ export class SerialDriver extends EventEmitter {
         let code;
         this.sendSeq = 0;
         this.recvSeq = 0;
+        this.rx_reject = false;^M

         debug(`<-- RSTACK ${frame}`);

ps : inspired from https://www.silabs.com/documents/public/user-guides/ug101-uart-gateway-protocol-reference.pdf

But zigbee-herdsman doesn't support NAK itsef : handleACK doesn't do retransmission. don't know yet if really necessary.

@merlinpimpim Regarding the root cause : My best assumption is that on my slow CPU (Rasberry pi zero) + all the MQTT updates required by the huge number of attributes sent or polled by the lixee device is so long to handle/execute by NodeJS that the EZSP device trigger retransmission, that current implementation doesn't handle properly.

romarysonrier commented 10 months ago

@merlinpimpim what is your CPU running nodejs ? any source of latencies between Nodejs and the ESZP device ?

merlinpimpim commented 10 months ago

I'm using HA + Z2M + Skyconnect on a Khadas Vim1S (https://www.khadas.com/vim1s).

Nerivec commented 10 months ago

@romarysonrier I'm looking into a more permanent solution to fix these sequence & related issues (that's not the only one). We'll see what I can come up with, needs some automated tests and all... 😉

romarysonrier commented 9 months ago

@Nerivec i made this PR proposal https://github.com/Koenkk/zigbee-herdsman/pull/911/files , which is far from perfect , but still improving the situation with NAK messages on rx side.

romarysonrier commented 9 months ago

On my case (Z2M running an a single core raspbery pi zero) , i think the root cause is now clear , at least for me: 1./ When lixee join the Zigbee network, it will generate on Z2M :

2./ all those updates trigger a huge amount of MQTT updates (quiet big dataset because lots of attributes count) + log writes ( if log_level=info), which generate a peak of CPU usage. 3./ then zigbee-herdsman EZSP UART driver is reading too late message from EZSP firmware (not sending ACK in the required timeframe) 4./ then , at some point, EZSP firmware is lost with seq number and refuses any new TX request with a fatal error. 5./ then zigbee-herdsman EZSP report a faltal error and Z2M shutdown.

So @merlinpimpim if you doing this :

You should see some improvement , as you lower the amount of CPU required to handle lixee device .

For me , It is running with those settings (+ disabling frontend) since 8 hours without a single Z2M shutdown. The main drawback of this workarround, is lost of non Reportable attributes.

merlinpimpim commented 9 months ago

@romarysonrier : I've made the changes in the Lixee.js file and it seems to work after resetting the Reportable attributes to "All". No crashes for 2 hours 😁but I get this type of message in the logs from time to time:

Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 liXeePrivate(["message2"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 164 - 65382 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["averageRmsVoltageMeasurePeriodPhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 176 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsCurrentPhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 177 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsVoltagePhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 179 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsVoltagePhC"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 180 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 seMetering(["meterSerialNumber"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 181 - 1794 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 seMetering(["siteId"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 182 - 1794 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["availablePower"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 183 - 2817 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["powerThreshold"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 184 - 2817 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["softwareRevision"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 185 - 2817 - 1 after 10000ms)
romarysonrier commented 9 months ago

@merlinpimpim "Failed to read zigbee attributes" are only generated by polling / read from Z2M in lixee.ts. Could you double check if you really have disabled polling ?

BTW, my setup is still up and running after 21 hours

remidebette commented 9 months ago

Hi guys, I am having the same error, I have a Lixee in historic mode (polling heavy) and the sonoff dongle Plus with the 7.3.1 firmware.

Since the 1.35.3-1 update of z2m, it would crash only a few seconds after start.

I want to know if switching the Linky from Historic to Standard mode would change anything? (Normally it should just be a call to the power provider) I have just started to build a smart home (prior to moving house in a few days) and I will definitely ask for it there.

Edit: in the meantime I simply unplugged the Lixee and z2m held for 24h max. But then failed again with the same issue. It is not only the Lixee then? I have only a few zigbee devices otherwise so far (Phillips hue bulbs and switch, a heiman Smart plug)

romarysonrier commented 9 months ago

@remidebette : switching the Linky from Historic to Standard mode=> more attributes => more mqtt update => more cpu load => more issue What is the cpu running Z2M ? serial over usb or network ?

remidebette commented 9 months ago

Not sure to understand the question, I have a Sonoff USB dongle Plus EFR32MG21 plugged on the raspberry pi 3+ that hosts my home assistant So it should be serial over USB?

I did not configure the dongle firmware to do multipan RCP (more precisely I had tested it at some point but I then rolled back the firmware to NCP 7.3.1)

romarysonrier commented 9 months ago

yes your dongle does serial over USB communication. Your setup based on a low end CPU (raspberry pi 3+) , , that may explain why you also encounter the issue.

Could you list you zigbee devices ? Could you try that fix :

This setup is running since 4 days on my side ( but i lost few non reportable attributes, the ones taged as read only in https://github.com/fairecasoimeme/Zlinky_TIC/tree/master?tab=readme-ov-file#subscription-optarif-values)

merlinpimpim commented 9 months ago

@merlinpimpim "Failed to read zigbee attributes" are only generated by polling / read from Z2M in lixee.ts. Could you double check if you really have disabled polling ?

BTW, my setup is still up and running after 21 hours

It's not that simple because to do it I have to connect to the Z2M container and make the change to the lixee.js file. This is a non-permanent modification because every time Z2M restarts the modification is lost (Z2M can restart without any link to Lixee, for example Home Assistant, HACS, Z2M version upgrades, etc.).

However, I've already had Z2M restart because of Lixee even though I think I've made the change in the current Z2M container... Maybe I'll wait for the PRs to be deployed in production now.

Anyway, by limiting the reportable attributes to the 6 I'm interested in and setting the top parameter to 2, I haven't had a single reboot in 24 hours.

image

romarysonrier commented 9 months ago

it seems that @mildis is the most active contributor lixee.ts, would be great to have some help...

@Koenkk : I would be great to implement a new feature to lower CPU usage + merge https://github.com/Koenkk/zigbee-herdsman/pull/911 :

the main benefits would be to decrease the amount of attributes updated by polling ( Reportable attributes doesn't need polling), and then to reduce the number of MQTT updates sent to HA.

mildis commented 9 months ago

Hi @romarysonrier, Proud you think of me for this but unfortunately I don't get that much about Z2M internals. I've just poke around Lixee as a end-user that could help making it more complete. Maybe @fairecasoimeme has a better understanding of the whole picture…

merlinpimpim commented 9 months ago

Just in case, without any changes to the converter or other files, if I delete the Lixee from Z2M 1.35.3 and start again from scratch, I get this error message (4th attempt, but same message every time):

Error 2024-02-18 22:01:05Failed to configure '0x00158d0005d2a627', attempt 4 (Error: ConfigureReporting 0x00158d0005d2a627/1 haElectricalMeasurement([{"attribute":"totalApparentPower","minimumReportInterval":60,"maximumReportInterval":900,"reportableChange":1}], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 30499 - 1 - 135 - 2820 - 7 after 10000ms) at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:63:23) at EZSPAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/ezsp/adapter/ezspAdapter.ts:486:47) at processTicksAndRejections (node:internal/process/task_queues:95:5) at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:35:20) at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:79:20) at Endpoint.configureReporting (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:669:28) at async Promise.allSettled (index 15) at Object.configure (/app/node_modules/zigbee-herdsman-converters/src/devices/lixee.ts:1020:14) at Configure.configure (/app/lib/extension/configure.ts:120:13) at Configure.onMQTTMessage (/app/lib/extension/configure.ts:55:21))

Nerivec commented 9 months ago

Hey @merlinpimpim, sorry I've been a little buried in code lately 😉 Did you try to force pairing to a specific device (instead of just "all" when you "permit join")? Try the coordinator first, if it's in range, otherwise the nearby routers one by one. You may have a router that's picking it up and making troubles... Since it's a timeout, seems a message is getting lost somewhere (or not being sent to the right place...).

@romarysonrier In case you have a test environment where you could put the LiXee up against ember, that'd be great (though don't go breaking your house 😄). This device clearly is a big spammer, which is one of the items missing from my "TODO tests" list!

merlinpimpim commented 9 months ago

@Nerivec: Sorry, I was busy myself 😅 By the way, before Z2M 1.36.0, it wasn't possible to try out your proposal because of this bug: https://github.com/Koenkk/zigbee-herdsman/pull/938 However, with the latest version of Z2M and after deleting LiXee from Z2M and starting a new pairing from scratch by forcing pairing with the nearest router (the coordinator is too far from the electricity meter), the good news is that it seems to have worked without getting the fatal message ("Adapter disconnected, stopping") for 48 hours! 😁 (@romarysonrier: I've left all the default settings, without touching anything, in the "Settings (specific)" tab).

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days