dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.9k stars 502 forks source link

deCONZ v2.05.39 abort #797

Closed ebaauw closed 5 years ago

ebaauw commented 6 years ago

This happened shortly after restarting deCONZ, before I could update the firmware to 0x26240500.

Sep 23 12:04:47 pi2 deCONZ[19868]: 12:04:47:137 APS-DATA.indication srcAddr: 0x00158d00007e2feb, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 255, rssi: -59
Sep 23 12:04:47 pi2 deCONZ[19868]: 12:04:47:137     asdu: 4700f7d7cd000*** Error in `/usr/bin/deCONZ': double free or corruption (!prev): 0x01a667e0 ***
Sep 23 12:04:47 pi2 systemd[1]: deconz-gui.service: Main process exited, code=killed, status=6/ABRT
Sep 23 12:04:50 pi2 systemd[1]: deconz-gui.service: Unit entered failed state.
Sep 23 12:04:50 pi2 systemd[1]: deconz-gui.service: Failed with result 'signal'.

With the recent Raspbian updates, coredumps seem to be disabled again...

manup commented 6 years ago

Hmm is this with the plugin from the package or self-compiled? I'm running the whole code with clang address sanitizer which normally should detect double free.

https://clang.llvm.org/docs/AddressSanitizer.html

ebaauw commented 6 years ago

With the distributed plugin.

By the way, compiler warnings are now limited to de_web_plugin.cpp and rest_userparameter.cpp - nice work!

manup commented 6 years ago

With the distributed plugin.

Hmm odd, maybe I should additionally try Valgrind to find the cause. Coredump would sure be also useful.

By the way, compiler warnings are now limited to de_web_plugin.cpp and rest_userparameter.cpp - nice work!

Yeah the new Qt Creator/clang/c++11 warnings work wonder, still a lot of orange warnings in the editor :)

jee67 commented 6 years ago

Another error here: pi@raspberrypi-conbee:~ $ sudo dpkg -i deconz-2.05.39-qt5.deb dpkg: unrecoverable fatal error, aborting: files list file for package 'zlib1g:armhf' is missing final newline

manup commented 6 years ago

What system is this, Raspbian? And which version?

jee67 commented 6 years ago

What system is this, Raspbian? And which version?

Yes, it’s Raspbian pi 3B pi@raspberrypi-conbee:~ $ uname -r 4.14.70-v7+ pi@raspberrypi-conbee:~ $

manup commented 6 years ago

Not sure what is causing the error, it should work on Raspbian Stretch and Jessie. zlib is nowhere referenced explicitly only a internal Qt dependency.

You may try:

ebaauw commented 6 years ago

I re-enabled dumping cores and created a Wiki page on how to do that.

I now get a number of core dumps on reboot:

Sep 23 15:40:49 pi2 systemd[1]: Started deCONZ: ZigBee gateway -- GUI/REST API.
Sep 23 15:40:52 pi2 deCONZ[683]: Invalid MIT-MAGIC-COOKIE-1 keyQXcbConnection: Could not connect to display :0
Sep 23 15:40:52 pi2 systemd[1]: deconz-gui.service: Main process exited, code=dumped, status=6/ABRT
Sep 23 15:40:52 pi2 systemd[1]: deconz-gui.service: Unit entered failed state.
Sep 23 15:40:52 pi2 systemd[1]: deconz-gui.service: Failed with result 'core-dump'.

Apparently the deconz-gui service starts too early, before the X11 windows system has been initialised. I haven't seen this before, maybe a change in a recent Raspbian version?

$ uname -a
Linux pi2 4.14.70-v7+ #1144 SMP Tue Sep 18 17:34:46 BST 2018 armv7l GNU/Linux
ebaauw commented 6 years ago

Hm, deconz-gui.service has a dependency on vncserver-x11-serviced, but the VNC server starts before X11 is active.

$ sudo systemctl status vncserver-x11-serviced
● vncserver-x11-serviced.service - VNC Server in Service Mode daemon
   Loaded: loaded (/usr/lib/systemd/system/vncserver-x11-serviced.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2018-09-23 15:40:48 CEST; 11min ago
 Main PID: 626 (vncserver-x11-s)
   CGroup: /system.slice/vncserver-x11-serviced.service
           ├─ 626 /usr/bin/vncserver-x11-serviced -fg
           ├─ 635 /usr/bin/vncserver-x11-core -service
           ├─ 696 /usr/bin/vncagent service 14
           ├─1063 /usr/bin/vncserverui service 16
           └─1093 /usr/bin/vncserverui -statusicon 7

Sep 23 15:40:48 pi2 systemd[1]: Started VNC Server in Service Mode daemon.
Sep 23 15:40:49 pi2 vncserver-x11[635]: ServerManager: Server started
Sep 23 15:40:49 pi2 vncserver-x11[635]: ConsoleDisplay: Cannot find a running X server on vt1
Sep 23 15:40:50 pi2 vncserver-x11[635]: ConsoleDisplay: Found running X server (pid=682, binary=/usr/lib/xorg/Xorg)
Sep 23 15:40:51 pi2 vncserver-x11[635]: Connections: connected: 192.168.76.41::64916
Sep 23 15:40:51 pi2 vncserver-x11[635]: Connections: authenticated: 192.168.76.41::64916, as (anonymous) (d permissions)
manup commented 6 years ago

So just waiting for both should fix the issue?

ebaauw commented 6 years ago

I think waiting for the X11 server should fix it, but I don't know which service that is (assuming it is a service).

EDIT That would be the lightdm service, which is already (also) listed in the service definition file under After=. I tried adding a Requires= entry with the same dependencies, to no avail.

After each reboot, the following warning re-appears, like deCONZ is messing with the service definition on system boot.

Warning: deconz-gui.service changed on disk. Run 'systemctl daemon-reload' to reload units.

Not sure if this is related, though.

jee67 commented 6 years ago

Not sure what is causing the error, it should work on Raspbian Stretch and Jessie. zlib is nowhere referenced explicitly only a internal Qt dependency.

You may try:

  • apt update && apt upgrade
  • download the deb file again
  • retry install

I’m afraid this wasn’t such a good idea. The system got corrupted and I’m afraid it has to be reinstalled. I don’t know what went wrong but I cannot connect anymore through VNC or ssh. So take care if you install the new deb package. Luckily it is on my test system, so no disasters happened.

manup commented 6 years ago

The system got corrupted and I’m afraid it has to be reinstalled

That's not good, on which step did this happen? The deb package should not be able to cause this. Do you know which deCONZ version was installed before?

jee67 commented 6 years ago

The system got corrupted and I’m afraid it has to be reinstalled

That's not good, on which step did this happen? The deb package should not be able to cause this. Do you know which deCONZ version was installed before?

2.05.38 was installed After downloading and installing 2.0.39 I got the error described in my previous post. Then I did the apt update && apt upgrade and still received the error. I tried something like apt clean and the /dev/apt/archives/partial went broken. I did some tries again like apt upgrade, but somehow ended up with a broken system. I am not sure whether the 2.05.39 is the real cause or it’s something else or a combination of things.

jee67 commented 6 years ago

The system got corrupted and I’m afraid it has to be reinstalled

That's not good, on which step did this happen? The deb package should not be able to cause this. Do you know which deCONZ version was installed before?

2.05.38 was installed After downloading and installing 2.0.39 I got the error described in my previous post. Then I did the apt update && apt upgrade and still received the error. I tried something like apt clean and the /dev/apt/archives/partial went broken. I did some tries again like apt upgrade, but somehow ended up with a broken system. I am not sure whether the 2.05.39 is the real cause or it’s something else or a combination of things.

update: after a long time waiting the system came up again(big relief). Still haven't installed 2.05.39, next try somewhere in the coming week.

manup commented 6 years ago

After each reboot, the following warning re-appears, like deCONZ is messing with the service definition on system boot.

Warning: deconz-gui.service changed on disk. Run 'systemctl daemon-reload' to reload units. Not sure if this is related, though.

This is needed when the systemd service files are changed, since they you edited them systemd needs to reload (cache?) them.

ebaauw commented 6 years ago

My point is: I don't edit them. After upgrading deCONZ, I do sudo systemctl daemon-reload and the warning disappears. But after reboot, the warning is there again.

$ ssh pi@pi1
Linux pi1 4.14.70-v7+ #1144 SMP Tue Sep 18 17:34:46 BST 2018 armv7l

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Sep 23 22:42:49 2018 from 192.168.76.41
$ sudo systemctl status deconz-gui
● deconz-gui.service - deCONZ: ZigBee gateway -- GUI/REST API
   Loaded: loaded (/lib/systemd/system/deconz-gui.service; enabled; vendor preset: enabl
  Drop-In: /etc/systemd/system/deconz-gui.service.d
           └─override.conf
   Active: active (running) since Sun 2018-09-23 22:40:10 CEST; 18h ago
 Main PID: 1421 (deCONZ)
   CGroup: /system.slice/deconz-gui.service
           ├─1421 /usr/bin/deCONZ --http-port=80 --dbg-info=2 --dbg-aps=2 --dbg-error=1
           ├─1473 dbus-launch --autolaunch 36338e1dca924dbd9a2413fc1346afd5 --binary-syn
           └─1474 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session

Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:019 >>>> Poll ZHAPresence sensor node Bedroom
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:019 Idle timer triggered
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:118 read attributes of 0x00158D0002834C6C clu
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:118 add task 270905 type 19 to 0x00158D000283
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:118 Poll APS request 110 to 0x00158D0002834C6
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:218 APS-DATA.request id: 110, addrmode: 0x03,
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:269 Poll APS confirm 110 status: 0x00
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:269 Erase task req-id: 110, type: 19 zcl seqn
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:269 APS-DATA.confirm id: 110, status: 0x00 SU
Sep 24 16:56:26 pi1 deCONZ[1421]: 16:56:26:269 APS-DATA.confirm request id: 110 -> erase
Warning: deconz-gui.service changed on disk. Run 'systemctl daemon-reload' to reload uni
$ sudo systemctl daemon-reload
$ sudo systemctl status deconz-gui
● deconz-gui.service - deCONZ: ZigBee gateway -- GUI/REST API
   Loaded: loaded (/lib/systemd/system/deconz-gui.service; enabled; vendor preset: enabl
  Drop-In: /etc/systemd/system/deconz-gui.service.d
           └─override.conf
   Active: active (running) since Sun 2018-09-23 22:40:10 CEST; 18h ago
 Main PID: 1421 (deCONZ)
   CGroup: /system.slice/deconz-gui.service
           ├─1421 /usr/bin/deCONZ --http-port=80 --dbg-info=2 --dbg-aps=2 --dbg-error=1
           ├─1473 dbus-launch --autolaunch 36338e1dca924dbd9a2413fc1346afd5 --binary-syn
           └─1474 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session

Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:40:194 Node 0x0013A20041831C4D is known by 5 nei
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:40:675 Node 0x0013A20041849DA6 is known by 5 nei
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:154 Node 0x00158D0002834C6C is known by 5 nei
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:395 APS-DATA.request id: 203, addrmode: 0x03,
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:441 APS-DATA.confirm id: 203, status: 0x00 SU
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:441 APS-DATA.confirm request id: 203 -> confi
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:521 APS-DATA.indication srcAddr: 0x0013a20041
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:521         asdu: 93000500031e0401ffff2e21001
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:521 APS-DATA.indication request id: 203 -> fi
Sep 24 16:56:41 pi1 deCONZ[1421]: 16:56:41:521 APS-DATA.request id: 203 erase from queue
$ sudo shutdown -r now
Connection to pi1 closed by remote host.
Connection to pi1 closed.
$
$ ssh pi@pi1
Linux pi1 4.14.70-v7+ #1144 SMP Tue Sep 18 17:34:46 BST 2018 armv7l

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Sep 24 16:56:15 2018 from 192.168.76.41
$ sudo systemctl status deconz-gui
● deconz-gui.service - deCONZ: ZigBee gateway -- GUI/REST API
   Loaded: loaded (/lib/systemd/system/deconz-gui.service; enabled; vendor preset: enabl
  Drop-In: /etc/systemd/system/deconz-gui.service.d
           └─override.conf
   Active: active (running) since Mon 2018-09-24 16:58:46 CEST; 43s ago
 Main PID: 1366 (deCONZ)
   CGroup: /system.slice/deconz-gui.service
           ├─1366 /usr/bin/deCONZ --http-port=80 --dbg-info=2 --dbg-aps=2 --dbg-error=1
           ├─1434 dbus-launch --autolaunch 36338e1dca924dbd9a2413fc1346afd5 --binary-syn
           └─1435 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session

Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:338 APS-DATA.request id: 117, addrmode: 0x03,
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:386 APS-DATA.confirm id: 117, status: 0x00 SU
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:386 APS-DATA.confirm request id: 117 -> confi
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:437 APS-DATA.indication srcAddr: 0x00158d0002
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:437         asdu: 1a000500021e0401ffff2e21001
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:437 APS-DATA.indication request id: 117 -> fi
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:437 APS-DATA.request id: 117 erase from queue
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:577 Node 0x00158D0001E075D6 is known by 1 nei
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:818 APS-DATA.request id: 121, addrmode: 0x03,
Sep 24 16:59:29 pi1 deCONZ[1366]: 16:59:29:868 APS-DATA.confirm id: 121, status: 0x00 SU
Warning: deconz-gui.service changed on disk. Run 'systemctl daemon-reload' to reload uni
$ 
ebaauw commented 5 years ago

deCONZ also crashes with a SIGSEGV dump on shutting down the Raspberry (presumably because the Xorg server exits before decONZ).

Doing some more research, it would seem that the service needs to be started from the user context with the X session, instead of from the system context. See https://superuser.com/questions/759759/writing-a-service-that-depends-on-xorg. I tried that, but got an error trying to enable the service:

$ sudo systemctl --user enable deconz-gui
Failed to connect to bus: No such file or directory
stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ebaauw commented 5 years ago

Issue still present in v02.05.60.

manup commented 5 years ago

Hmm haven't had (or noticed) this issue yet, I'll put it on the todo list for testing. Normally I would expect systemd handling proper shutdown based on the defined dependencies.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Shurov commented 4 years ago

Edit: SOLVED

I have deCONZ 2.05.71 running in a docker container. And when I tried to add VNC parameters to the docker-compose, it ended up in the Invalid MIT-MAGIC-COOKIE-1 keyQXcbConnection: Could not connect to display :0 error. Is there any workaround to pause VNC start until X11?

Ok, from decontz_docker topic I found the solution to change the default port to the other one

oywino commented 4 years ago

.... and would you mind letting the rest of us know what "the other one" is ?

amritk commented 3 years ago

deconz is still editing the service files and changing the port