dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.9k stars 498 forks source link

Crashes #7871

Closed arjannv closed 1 month ago

arjannv commented 1 month ago

Does the issue really belong here?

Is there already an existing issue for this?

Describe the bug

DeConz crashes shortly (30 seconds to 1 minute) after start. This happens with and without GUI. This also happens when I disable Domoticz and Homebridge. I've also tried to remove the last added device, but that did not change the behavior.

Steps to reproduce the behavior

Start DeConz with or without GUI and it crashes.

Expected behavior

Have it not crash

Screenshots

No response

Environment

deCONZ Logs

deconz.log

Additional context

No response

Mimiix commented 1 month ago

Asked the devs to check in When did this start happening?

arjannv commented 1 month ago

I noticed that it was not running last night. But only found out that it continuously crashes and restarts this morning. In the mean time I've tested a backup of the database of last weekend, but that did not change anything.. If I need to provide any additional logs or try something else, please let me know.

manup commented 1 month ago

We're looking into a crash in one of our setups today, might be related.

SwoopX commented 1 month ago

@arjannv thanks for reporting this issue. While this is being checked in a different setup, could you please gather some information from your system?

Just follow what's described here: https://github.com/dresden-elektronik/deconz-rest-plugin/issues/5302#issuecomment-920376056

arjannv commented 1 month ago

Hi @SwoopX, thanks for your reply. Please find the results of the short run below:


arjan@vosity-home:~$ gdb --args /usr/bin/deCONZ -platform minimal
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/deCONZ...
(No debugging symbols found in /usr/bin/deCONZ)
(gdb) r
Starting program: /usr/bin/deCONZ -platform minimal
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ff9bbf7a640 (LWP 154953)]
[Detaching after vfork from child process 154954]
libpng warning: iCCP: known incorrect sRGB profile
[Detaching after vfork from child process 154956]
[New Thread 0x7ff9b8708640 (LWP 154958)]
[New Thread 0x7ff9b7bfd640 (LWP 154959)]
This plugin does not support propagateSizeHints()
This plugin does not support propagateSizeHints()
[Detaching after vfork from child process 154961]
[New Thread 0x7ff9b6b9f640 (LWP 154963)]
[Detaching after fork from child process 154964]
This plugin does not support propagateSizeHints()
This plugin does not support propagateSizeHints()
This plugin does not support propagateSizeHints()
[Detaching after vfork from child process 154980]
[New Thread 0x7ff9b5e8a640 (LWP 155006)]
[New Thread 0x7ff9b5689640 (LWP 155007)]
[Thread 0x7ff9b5689640 (LWP 155007) exited]

Thread 1 "deCONZ" received signal SIGSEGV, Segmentation fault.
0x00007ff9bf514ea1 in operator==(QString const&, QString const&) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
(gdb)
SwoopX commented 1 month ago

@arjannv thanks. I'm afraid we're missing the backtrace bt command here. Would be great if you could run it once more 👍

arjannv commented 1 month ago

Oops sorry, reading is hard..


Starting program: /usr/bin/deCONZ -platform minimal
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f98ee0ca640 (LWP 199437)]
[Detaching after vfork from child process 199438]
libpng warning: iCCP: known incorrect sRGB profile
[Detaching after vfork from child process 199440]
[New Thread 0x7f98ea8d2640 (LWP 199446)]
[New Thread 0x7f98e9d40640 (LWP 199447)]
This plugin does not support propagateSizeHints()
This plugin does not support propagateSizeHints()
[Detaching after vfork from child process 199448]
[New Thread 0x7f98e8d9f640 (LWP 199450)]
[Detaching after fork from child process 199451]
This plugin does not support propagateSizeHints()
This plugin does not support propagateSizeHints()
This plugin does not support propagateSizeHints()
[Detaching after vfork from child process 199467]
[New Thread 0x7f98dbfff640 (LWP 199498)]
[New Thread 0x7f98db7fe640 (LWP 199499)]
[Thread 0x7f98db7fe640 (LWP 199499) exited]
bt

Thread 1 "deCONZ" received signal SIGSEGV, Segmentation fault.
0x00007f98f1714ea1 in operator==(QString const&, QString const&) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
(gdb) bt
#0  0x00007f98f1714ea1 in operator==(QString const&, QString const&) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#1  0x00007f98eb43245b in Resource::setValue(char const*, QString const&, bool) () from /usr/share/deCONZ/plugins/libde_rest_plugin.so
#2  0x00007f98eb390704 in DeRestPluginPrivate::updateLightNode(deCONZ::NodeEvent const&) () from /usr/share/deCONZ/plugins/libde_rest_plugin.so
#3  0x00007f98eb3be5bb in DeRestPluginPrivate::nodeEvent(deCONZ::NodeEvent const&) () from /usr/share/deCONZ/plugins/libde_rest_plugin.so
#4  0x00007f98eb2974c2 in DeRestPluginPrivate::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) () from /usr/share/deCONZ/plugins/libde_rest_plugin.so
#5  0x00007f98f18b47c8 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#6  0x00007f98f2220345 in deCONZ::ApsController::nodeEvent(deCONZ::NodeEvent const&) () from /usr/bin/../lib/libdeCONZ.so.1
#7  0x000056322669fdee in zmController::onApsdeDataIndication(deCONZ::ApsDataIndication const&) ()
#8  0x00005632266c4173 in zmMaster::processPacked(zm_command const*) ()
#9  0x00005632266c5730 in SER_Packet(unsigned char*, unsigned short) ()
#10 0x0000563226645b65 in protocol_receive ()
#11 0x00005632266c6083 in SerialComPrivate::rx() ()
#12 0x00005632266c6a54 in SerialCom::processTh0Events() ()
#13 0x00007f98f18aa41e in QObject::event(QEvent*) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007f98f2d52713 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5
#15 0x00007f98f187ce3a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007f98f187ff27 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007f98f18d6a67 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#18 0x00007f98f0937d3b in g_main_context_dispatch () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x00007f98f098d2b8 in ?? () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x00007f98f09353e3 in g_main_context_iteration () from /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x00007f98f18d60b8 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#22 0x00007f98f187b75b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#23 0x00007f98f1883cf4 in QCoreApplication::exec() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#24 0x00005632266392ab in main ()
manup commented 1 month ago

There aren't that many calls to Resource::setValue(char const*, QString const&, bool) in updateLightNode(). The deconz.log has a hint prior to crash what may lead to the crash place:

08:38:01:518 read attributes of 0xA4C1386A0100524E cluster: 0x0300: [
08:38:01:518 0x0001
08:38:01:518 0x0003
08:38:01:518 0x0004
08:38:01:518 0x0007
08:38:01:518 0x0008
08:38:01:518 0x4000
08:38:01:518 0x4001
08:38:01:518 0x4002
08:38:01:518 ]

Note the 0x4002 attribute is read and crash may happen when processing the response:

else if (ia->id() == 0x4002) // color loop active
{
    if (RStateEffectValuesMueller.indexOf(lightNode->toString(RStateEffect), 0) <= 1)
    {
        lightNode->setValue(RStateEffect, RStateEffectValues[ia->numericValue().u8]);
    }
}

Here the RStateEffectValues[ia->numericValue().u8] is a bit optimistic and expects the index will be less than 2 (the array only has two strings).

Mimiix commented 1 month ago

There aren't that many calls to Resource::setValue(char const*, QString const&, bool) in updateLightNode(). The deconz.log has a hint prior to crash what may lead to the crash place:

08:38:01:518 read attributes of 0xA4C1386A0100524E cluster: 0x0300: [
08:38:01:518 0x0001
08:38:01:518 0x0003
08:38:01:518 0x0004
08:38:01:518 0x0007
08:38:01:518 0x0008
08:38:01:518 0x4000
08:38:01:518 0x4001
08:38:01:518 0x4002
08:38:01:518 ]

Note the 0x4002 attribute is read and crash may happen when processing the response:

else if (ia->id() == 0x4002) // color loop active
{
    if (RStateEffectValuesMueller.indexOf(lightNode->toString(RStateEffect), 0) <= 1)
    {
        lightNode->setValue(RStateEffect, RStateEffectValues[ia->numericValue().u8]);
    }
}

Here the RStateEffectValues[ia->numericValue().u8] is a bit optimistic and expects the index will be less than 2 (the array only has two strings).

So this is an DDF?

Nvm, just noticed the pr. Nice catch

manup commented 1 month ago

It's part of the legacy code. A device driven by DDF never hits this code. So another todo is to make a DDF for the device which got hit here.

arjannv commented 1 month ago

Oh I think I know where it happened. I set a Gledopto mini led strip dimmer to a color loop, and probably that is when it started to crash. I’ll try to adapt a DDF and see if it fixes the crash. Thank you all for finding this so quickly!

arjannv commented 1 month ago

So I tried to adapt a DDF between crashes, but I couldn't make it work. However, deleting the device 0xA4C1386A0100524E (Gledopto GL-C-008P) solved the issue temporarily. I will try to re-add the device and see if I can make it work in the next few days. Btw, while looking at the cluster info of the device in the GUI, it always crashed when I tried to read the attributies of cluster 0x0300.

manup commented 1 month ago

Makes sense the attribute belongs to the color cluster. In case a DDF is active which matches by modelid and manufacturer name, the faulty code in updateLightNode() shouldn't be executed.

MaStahl84 commented 1 month ago

Oh I think I know where it happened. I set a Gledopto mini led strip dimmer to a color loop, and probably that is when it started to crash. I’ll try to adapt a DDF and see if it fixes the crash. Thank you all for finding this so quickly!

@arjannv I have exacty the the same problem, with the same Geldopto (GL-C-008P). When I start a ColorLoop on that device, Deconz chrashes. Restart of Ubuntu (virtual maschine) with Deconz, it will run for round about 30 seconds, and then it crashes again. The problem can be solved, when i restart the Gledopto Controller (cold restart). The next time i start a CololrLoop on that Controller, Deconz crashes again, until i restart the GledoptoController again.

When there are special Logs needed, led me know.

arjannv commented 1 month ago

The problem can be solved, when i restart the Gledopto Controller (cold restart).

Thanks! Good to know that it can be temporarily solved by restarting the dimmer. That will help to try to create a DDF :) Alternatively, we can also wait for the updated version of DeConz..

manup commented 1 month ago

The fix for the crash is now available v2.28.0-beta

If you can confiorm it works, I'd suggest to close this issue and make a separate device request issue to create a DDF for the Gledopto GL-C-008P (see wiki/Request-Device-Support). With that we can move away from the legacy code the device is currently using.

arjannv commented 1 month ago

Great, thanks! I've just installed the update and checked if I can use the colorloop effect and that worked!