Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge πŸŒ‰, get rid of your proprietary Zigbee bridges πŸ”¨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.13k stars 1.68k forks source link

zigbee2mqtt version 1.12.2 (commit #911cd58) crashes every few minutes #3427

Closed std10k closed 4 years ago

std10k commented 4 years ago

zigbee2mqtt version 1.12.2 (commit #911cd58) Crashes every few minutes

What happened

Z2m starts seemingly OK, both as a service or manually, though it tends to restart once or twice before it comes online (not sure if that is expected). Then it connects to the 2530 over TCP. Bridge status shows online, and is able to see coordinator and devices. Hass also sees the devices. It works for a few minutes (less than 5 usually), then z2m crashes and restarts.

What did you expect to happen

it should keep running

Debug Info:

zigbee2mqtt version 1.12.2 (runs in venv as per FAQ; starting with the commands# source /opt/zigbee2mqtt/bin/activate; /opt/zigbee2mqtt/bin/npm start)

Coordinator version CC2530_DEFAULT_20190608 Coordinator hardware: CC2530 with ESP8266, EasyESP latest 'mega' build from April 2020 (also tried "stable" build 120 first, same effect) Currently NO devices connected to this coordinator, but i was able to join a device (xiaomi buttons and sensors) and see them in HASS. I was also able to start this z2m build with esp+cc2530 with a database from another 2531 (my primary at the moment) and all devices were connecting successfully, but obviously it was unstable. I started over with an empty database then for troubleshooting.

Raspberry Pi3/Raspbian; Newly built just days ago for migration to a new Hass version and CC2530/ESP8266 coordinator. Pi is connected over the wire, esp is on wifi on the same network. Wifi reception on ESP is good, ping is stable

I tried upgrading ESPEasy version to the latest. it seems to have made ESP work somewhat faster, but didn't have any visible impact on the issue with z2m crashes.

I am not sure whether this is something wrong with cc2530/esp or Z2M, unfortunately i do not have another 2530 or esp to try (only have another 2531), but the fact that it works initially and that z2m process crashes without any apparent message in the log makes me think there may be an issues with z2m. Can provide detailed log outputs and other information as needed.

I notices that TCP probe on port 1775 on ESP/2530 (serial server monitored by PRTG) at times has quite slow response (close to 3 sec) but most of the time it is pretty low. Not sure if it could be causing some kind of connection timeout, but i couldn't find any settings to control it.

How to reproduce it (minimal and precise)

happens on its own every few minutes with the above build. `` z2m configuration

homeassistant: true permit_join: false mqtt: base_topic: zigbee2mqtt server: 'mqtt://localhost' serial: port: 'tcp://192.168.1.33:1775' advanced:

baudrate: 115200

rtscts: false log_level: debug ``

The last few lines from debug output `` (zigbee2mqtt) root@raspberrypi2:/opt/zigbee2mqtt# /opt/zigbee2mqtt/bin/npm start

zigbee2mqtt@1.12.2 start /opt/zigbee2mqtt node index.js

zigbee2mqtt:info 2020-04-25 21:22:50: Logging to console and directory: '/opt/zigbee2mqtt/data/log/2020-04-25.21-22-50' filename: log.txt zigbee2mqtt:debug 2020-04-25 21:22:50: Removing old log directory '/opt/zigbee2mqtt/data/log/2020-04-25.19-50-01' zigbee2mqtt:debug 2020-04-25 21:22:51: Loaded state from file /opt/zigbee2mqtt/data/state.json zigbee2mqtt:info 2020-04-25 21:22:51: Starting zigbee2mqtt version 1.12.2 (commit #911cd58) zigbee2mqtt:info 2020-04-25 21:22:51: Starting zigbee-herdsman... zigbee2mqtt:debug 2020-04-25 21:22:51: Using zigbee-herdsman with settings: '{"network":{"panID":6754,"extendedPanID":[221,221,221,221,221,221,221,221],"channelList":[11],"networkKey":"HIDDEN"},"databasePath":"/opt/zigbee2mqtt/data/database.db","databaseBackupPath":"/opt/zigbee2mqtt/data/database.db.backup","backupPath":"/opt/zigbee2mqtt/data/coordinator_backup.json","serialPort":{"baudRate":115200,"rtscts":false,"path":"tcp://192.168.1.33:1775"},"adapter":{"concurrent":null}}' zigbee2mqtt:info 2020-04-25 21:22:56: zigbee-herdsman started zigbee2mqtt:info 2020-04-25 21:22:56: Coordinator firmware version: '{"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}}' zigbee2mqtt:debug 2020-04-25 21:22:56: Zigbee network parameters: {"panID":6754,"extendedPanID":"0xdddddddddddddddd","channel":11} zigbee2mqtt:info 2020-04-25 21:22:56: Currently 0 devices are joined: zigbee2mqtt:info 2020-04-25 21:22:56: Zigbee: disabling joining new devices. zigbee2mqtt:info 2020-04-25 21:22:56: Connecting to MQTT server at mqtt://localhost zigbee2mqtt:info 2020-04-25 21:22:57: Connected to MQTT server zigbee2mqtt:info 2020-04-25 21:22:57: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload 'online' zigbee2mqtt:info 2020-04-25 21:22:57: MQTT publish: topic 'zigbee2mqtt/bridge/config', payload '{"version":"1.12.2","commit":"911cd58","coordinator":{"type":"zStack12","meta":{"transportrev":2,"product":0,"majorrel":2,"minorrel":6,"maintrel":3,"revision":20190608}},"log_level":"debug","permit_join":false}' zigbee2mqtt:debug 2020-04-25 21:23:02: Received MQTT message on 'zigbee2mqtt/bridge/config/devices/get' with data '' zigbee2mqtt:info 2020-04-25 21:23:02: MQTT publish: topic 'zigbee2mqtt/bridge/config/devices', payload '[{"ieeeAddr":"0x00124b001d3b0c0e","type":"Coordinator","networkAddress":0,"friendly_name":"Coordinator","softwareBuildID":"zStack12","dateCode":"20190608","lastSeen":1587820982107}]' zigbee2mqtt:debug 2020-04-25 21:23:02: Received MQTT message on 'zigbee2mqtt/bridge/config/devices' with data '[{"ieeeAddr":"0x00124b001d3b0c0e","type":"Coordinator","networkAddress":0,"friendly_name":"Coordinator","softwareBuildID":"zStack12","dateCode":"20190608","lastSeen":1587820982107}]' zigbee2mqtt:info 2020-04-25 21:23:02: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"devices","message":[{"ieeeAddr":"0x00124b001d3b0c0e","type":"Coordinator","networkAddress":0,"friendly_name":"Coordinator","softwareBuildID":"zStack12","dateCode":"20190608","lastSeen":1587820982121}]}' zigbee2mqtt:debug 2020-04-25 21:24:00: Received MQTT message on 'zigbee2mqtt/bridge/config/devices/get' with data '' zigbee2mqtt:info 2020-04-25 21:24:00: MQTT publish: topic 'zigbee2mqtt/bridge/config/devices', payload '[{"ieeeAddr":"0x00124b001d3b0c0e","type":"Coordinator","networkAddress":0,"friendly_name":"Coordinator","softwareBuildID":"zStack12","dateCode":"20190608","lastSeen":1587821040015}]' zigbee2mqtt:debug 2020-04-25 21:24:00: Received MQTT message on 'zigbee2mqtt/bridge/config/devices' with data '[{"ieeeAddr":"0x00124b001d3b0c0e","type":"Coordinator","networkAddress":0,"friendly_name":"Coordinator","softwareBuildID":"zStack12","dateCode":"20190608","lastSeen":1587821040015}]' zigbee2mqtt:info 2020-04-25 21:24:00: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"type":"devices","message":[{"ieeeAddr":"0x00124b001d3b0c0e","type":"Coordinator","networkAddress":0,"friendly_name":"Coordinator","softwareBuildID":"zStack12","dateCode":"20190608","lastSeen":1587821040025}]}' zigbee2mqtt:error 2020-04-25 21:25:02: Adapter disconnected, stopping zigbee2mqtt:debug 2020-04-25 21:25:02: Saving state to file /opt/zigbee2mqtt/data/state.json zigbee2mqtt:info 2020-04-25 21:25:02: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload 'offline' zigbee2mqtt:info 2020-04-25 21:25:02: Disconnecting from MQTT server zigbee2mqtt:error 2020-04-25 21:25:02: Failed to stop zigbee npm ERR! code ELIFECYCLE npm ERR! errno 1 npm ERR! zigbee2mqtt@1.12.2 start: node index.js npm ERR! Exit status 1 npm ERR! npm ERR! Failed at the zigbee2mqtt@1.12.2 start script. npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in: npm ERR! /root/.npm/_logs/2020-04-25T13_25_02_327Z-debug.log (zigbee2mqtt) root@raspberrypi2:/opt/zigbee2mqtt#

``

`(zigbee2mqtt) root@raspberrypi2:/opt/zigbee2mqtt# cat /root/.npm/_logs/2020-04-25T13_25_02_327Z-debug.log 0 info it worked if it ends with ok 1 verbose cli [ '/opt/zigbee2mqtt/bin/node', 1 verbose cli '/opt/zigbee2mqtt/bin/npm', 1 verbose cli 'start' ] 2 info using npm@6.4.1 3 info using node@v10.15.1 4 verbose run-script [ 'prestart', 'start', 'poststart' ] 5 info lifecycle zigbee2mqtt@1.12.2~prestart: zigbee2mqtt@1.12.2 6 info lifecycle zigbee2mqtt@1.12.2~start: zigbee2mqtt@1.12.2 7 verbose lifecycle zigbee2mqtt@1.12.2~start: unsafe-perm in lifecycle true 8 verbose lifecycle zigbee2mqtt@1.12.2~start: PATH: /opt/zigbee2mqtt/lib/node_modules/npm/node_modules/npm-lifecycle/node-gyp-bin:/opt/zigbee2mqtt/node_modules/.bin:/opt/zigbee2mqtt/lib/node_modules/.bin:/opt/zigbee2mqtt/bin:/opt/zigbee2mqtt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 9 verbose lifecycle zigbee2mqtt@1.12.2~start: CWD: /opt/zigbee2mqtt 10 silly lifecycle zigbee2mqtt@1.12.2~start: Args: [ '-c', 'node index.js' ] 11 silly lifecycle zigbee2mqtt@1.12.2~start: Returned: code: 1 signal: null 12 info lifecycle zigbee2mqtt@1.12.2~start: Failed to exec start script 13 verbose stack Error: zigbee2mqtt@1.12.2 start:node index.js 13 verbose stack Exit status 1 13 verbose stack at EventEmitter.<anonymous> (/opt/zigbee2mqtt/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:301:16) 13 verbose stack at EventEmitter.emit (events.js:189:13) 13 verbose stack at ChildProcess.<anonymous> (/opt/zigbee2mqtt/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14) 13 verbose stack at ChildProcess.emit (events.js:189:13) 13 verbose stack at maybeClose (internal/child_process.js:970:16) 13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5) 14 verbose pkgid zigbee2mqtt@1.12.2 15 verbose cwd /opt/zigbee2mqtt 16 verbose Linux 4.19.97-v7+ 17 verbose argv "/opt/zigbee2mqtt/bin/node" "/opt/zigbee2mqtt/bin/npm" "start" 18 verbose node v10.15.1 19 verbose npm v6.4.1 20 error code ELIFECYCLE 21 error errno 1 22 error zigbee2mqtt@1.12.2 start:node index.js` 22 error Exit status 1 23 error Failed at the zigbee2mqtt@1.12.2 start script. 23 error This is probably not a problem with npm. There is likely additional logging output above. 24 verbose exit [ 1, true ] (zigbee2mqtt) root@raspberrypi2:/opt/zigbee2mqtt#

``

std10k commented 4 years ago

OK, writing all this gave me a few clues. After another day of messing around with this i found the problem, it seems to be EasyESP software on esp8266. So basically what happened, after i had some difficulties with ESPEasy initially i set up a monitor in PRTG to ping ESP and also do a TCP probe on its 'serial' TCP port, in my case 1775. It turns out that if any TCP connection is established to that port on ESP (e.g. telnet esp_ip 1775, which is probably what prtg does as well to check the port), after that connection is terminated or timed out, ESP also drops the connection from Z2M (presumably all other connections), causing it to consider the adapter failed and restart, which i think is generally not an unexpected behavior. So disabling that probe now fixed my issue. But apparently ANY connection from ANY other device to ESP8266 with cc2530 on the serial server port will basically kill zigbee2mqtt process because ESP will kill TCP connection to the Z2M. Do not run NMAP on your network if you have this kind of config :)

While it is not a z2m problem, i think it would be great if z2m had some kind of 'graceful TCP restart' capability to handle this sort of situation, if it makes any sense, so that instead of restarting the whole process it would only have to reestablish the TCP socket which would work much faster.

Koenkk commented 4 years ago

This indeed looks like a connection issue, but zigbee-herdsman could handle this a bit more friendly indeed.

rvt commented 4 years ago

as a note, espeasy on esp8266 for me is not very stable

1) I need to press the [submit] button on the ser2net device after a start just to enable serial communication 2) After a few days I got disconnected and by just pressing [submit] button on ser2net it started to work again.

soo, espeasy will be out as soon as I find a replacement. Is ser2net just a straight telnet communication eg 1:1 tcp <-> serial ??

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rvt commented 4 years ago

New version solves instability issues.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days