dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.9k stars 502 forks source link

Raspi totally blocked out, no more reaction on mouse and lights #1190

Closed Ronoon closed 5 years ago

Ronoon commented 5 years ago

This is the second time this week my Raspi is not responding anymore. Still running before with no problems, now it´s a problem that looks like stack overflow, since the CPU% is steadily increasing. Today my Raspi died again and no response to the was given as nomore lights were responding to the motion sensors. after restarting everything is working again but with a cpu% at 20 to 30% Also it is the original precompiled DeConz Version from Dresden Elektronic, no changes/playing around has been done

Running Raspberry PI 3B with only DeConz extension and nothing more Firmware is 0x262F0500 Commit 880f784e9c7ce4526a19315e6fb3ecb2ab3393ebF DeConz 2.05.57

Anybody else with these symptoms??

manup commented 5 years ago

Also it is the original precompiled DeConz Version from Dresden Elektronic, no changes/playing around has been done.

Do you mean the official sd-card image or deCONZ package?

Ronoon commented 5 years ago

http://www.dresden-elektronik.de/rpi/deconz-dev/

deconz-dev-2.05.57.deb 20-Jan-2019 20:19 21K  

got it from there an made a clean install after the browser issue with the web interface and the sliders

Ronoon commented 5 years ago

btw. have you seen the compiling error on a raspian that looks like this if you compile the package manually?

de_web_plugin.cpp: In member function ‘bool DeRestPluginPrivate::resetConfiguration(bool, bool)’: de_web_plugin.cpp:14792:32: note: parameter passing for argument of type ‘std::initializer_list’ will change in GCC 7.1 epData["inClusters"] = QVariantList({ "0x0019", "0x000a" }); ^~~~~~~~ de_web_plugin.cpp:14793:33: note: parameter passing for argument of type ‘std::initializer_list’ will change in GCC 7.1 epData["outClusters"] = QVariantList({ "0x0500" }); ^~~~~~ de_web_plugin.cpp:14802:33: note: parameter passing for argument of type ‘std::initializer_list’ will change in GCC 7.1 epData["outClusters"] = QVariantList({ "0x0021" }); ^~~~~~

manup commented 5 years ago

http://www.dresden-elektronik.de/rpi/deconz-dev/

That's the development package which only contains headers. Did you also update the deCONZ package?

https://www.dresden-elektronik.de/rpi/deconz/beta/deconz-2.05.57-qt5.deb

btw. have you seen the compiling error on a raspian that looks like this if you compile the package manually?

I don't think so, note deCONZ build server is running Raspbian Jessie.

Ronoon commented 5 years ago

yes installed that one

manup commented 5 years ago

Hmm strange, honestly I have no idea right now what happened.

In my test networks with up to 80 nodes, the cpu is between 5–10% on average.

Note if you have LQI value display enabled, disable it since eats quite some cpu.

Ronoon commented 5 years ago

nope no LQI enabled and right now with the browser running i´m at 34-37 % without it is more or less between 27 and 33% Mysterious

Ronoon commented 5 years ago

okay i found a taskmanager and there was a task called lxpanel running @28+% after killing the task it showed up again but with only 1 to 2% since then the cpu load is down to 12%

don´t know what it is but it help bringing the cpu down again.

could be a system task because every time i´ve done a reboot the load was up again well rebooting again and looking if it happens again

Ronoon commented 5 years ago

well i verified that and again lxpanel was using the cpu percentage by 28% and more once again after killing the task it got down to 0 to 2% max

manup commented 5 years ago

Hmm strange, looks like something different isn't working in the system. lxpanel is a normal part of Raspbian desktop.

djwlindenaar commented 5 years ago

In the cathegory of strange stuff happening with your raspi, I've had an interesting episode this weekend with strange things happening to the web interface and various other unclear symptoms. It turned out the sdcard died, which apparently means that it silently ignores any writes, but that's not very notable due to the OS caching most data in ram.

I finally figured it out when trying to do an fsck while the sdcard was in a Cardreader in my PC. One typical symptom is that you get an message in dmesg stating the journal is being recovered even upon a clean shutdown/reboot. Have a look..

Ronoon commented 5 years ago

Well maybe i have to setup the system complete new and have a look what is happening then. So far my load is down to the "normal" 7-10% therefor i assume that there is something in the system at boottime wrong or the system is messed up. I don´t belive it has to do with the DeConz software but you´ll never know for sure if your not setting up the system to brandnew state. I will keep you informed.

thecem commented 5 years ago

I have the same issue with a headless rasp pi wh 4.14.94+ #1193:

23160 pi 20 0 137220 39952 27120 R 81.6 9.0 286:26.68 deCONZ.

After restart of deconz is the CPU% is steadily increasing, normaly deconz have 10 % CPU.

manup commented 5 years ago

Which deCONZ version are you running?

thecem commented 5 years ago

DeConz 2.05.57

manup commented 5 years ago

And when you run top how much cpu % is used by deCONZ?

thecem commented 5 years ago

I stated it above:

23160 pi 20 0 137220 39952 27120 R "CPU 81.6 %" 9.0 286:26.68 deCONZ. This is increasing steadily from 10 to 80 percent in 8-9 hours. After 1h day i have a delay up to 6 seconds for a xiomi switch. After restart of deconz it works for a moment as expexted, and increases slowly the cpu load.

thecem commented 5 years ago

Error what I found after restart at 10:18:40 daemon.log:

Jan 29 10:18:48 hassbianZ deCONZ[15683]: PROTO: CRC error Jan 29 10:18:53 hassbianZ deCONZ[23160]: QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once. Jan 29 12:00:00 hassbianZ deCONZ-update2.sh[23101]: Error: database is locked

manup commented 5 years ago

PROTO CRC error happens when the serial communication is disturbed. If it happens repeatedly it indicates some other process may also try accessing the serial port.

Not sure about hassbian but on Raspbian the serial port needs to be configured so that the system doesn't write logs to it. This is described here: https://github.com/dresden-elektronik/deconz-rest-plugin#software-requirements

The database error can be ignored it's a noisy debug output.

thecem commented 5 years ago

hassbian is almost the same. I could swear I disabled after checking it wasn't, so i will try this.

manup commented 5 years ago

Cool, please keep us informed if it works better.

thecem commented 5 years ago

PROTO CRC error is after the configuration of the serial port not longer in log, before it only apears for a short sequenz in log when starting deconz.

The CPU load Issue isn´t disappeared, since yesterday the CPU load of deconz is in min 68%. All other loads are below 10%. Only deconz-wifi2.sh and deconz-update2.sh shows up every 2 sec with a load of 0.6 up to 1.2%.

I have around 70 devices connected.

Is ther a possiblity to log something to see what is the reason behind this issue? like: (--dbg-error=1) (--db-aps=1) (--dbg-http=1) (--dbg-info=2) (--dbg-zdp=1) (--dbg-zcl=1)

manup commented 5 years ago

Is ther a possiblity to log something to see what is the reason behind this issue? like: (--dbg-error=1) (--db-aps=1) (--dbg-http=1) (--dbg-info=2) (--dbg-zdp=1) (--dbg-zcl=1)

Sure, start deCONZ with these options

deCONZ -platform minimal --dbg-info=2 --dbg-aps=1 --dbg-error=1 --http-port=80

This will be quite noisy.

thecem commented 5 years ago

I have added it but no abnormal logs, no error.

thecem commented 5 years ago

I installed 2.05.58 same CPU load problem. So I installed 2.05.55 and now the CPU load is only around 18 % I searched the changelog and found:

Enable high speed polling of devices

https://github.com/dresden-elektronik/deconz-rest-plugin/commit/53c8e4cb34ef4a7deb3bb5b0d266c1325f10e033

I don't know, may there is a problem with the lower CPU power of a rasp zero wh. But i think with this there will be not enough time between the next poll, to calculate the information of the previous poll. So the cpu load of the deconz.service soar up after a reboot from normal to full load.

Is there a possiblity to test this, where can i change the files in 58 to the .55 onces?

manup commented 5 years ago

Ah ok it's a Raspberry Pi Zero? Haven't tested it with this one in a while. Looking forward to make some tests with the RPi zero and see where the issue happens.

Polling shouldn't be the problem but processing the incoming data non stop might take it's share (but shouldn't bring the network down).

We have one test network on a Pi zero with ~70 lights, so might be a good test candidate to check what's going on.

Is there a possiblity to test this, where can i change the files in 58 to the .55 onces?

Hardly without a lot of compiling and cross checking what changed.

thecem commented 5 years ago

thanks a lot, if you need help let me know...

thecem commented 5 years ago

This is the log after reboot with no special logging enabled:

Jan 31 20:33:26 hassbianZ deCONZ[6455]: libpng warning: iCCP: known incorrect sRGB profile Jan 31 20:33:34 hassbianZ deCONZ[6455]: This plugin does not support propagateSizeHints() Jan 31 20:33:34 hassbianZ deCONZ[6455]: This plugin does not support propagateSizeHints() Jan 31 20:33:37 hassbianZ deCONZ[6455]: QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once. Jan 31 20:33:43 hassbianZ deCONZ[6455]: This plugin does not support propagateSizeHints() Jan 31 20:33:58 hassbianZ deCONZ[6455]: QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.

thecem commented 5 years ago

with disabled deconz-update and -wifi it is still slowly (after 5h) increasing to full cpu load (.58). I downgraded to .55 and have after around 48h only between 6 and 12 % CPU load for deconz.service (pi zero wh with 4.14.94+ raspbian)

jameshawkes commented 5 years ago

I am also having this issue on 2.05.58 running in a docker container.

manup commented 5 years ago

I am also having this issue on 2.05.58 running in a docker container.

Which Raspberry Pi do you have?

jameshawkes commented 5 years ago

Raspberry Pi 3 Model B, with a ConBee stick.

Ronoon commented 5 years ago

@Manup: I have done an update on a different /original installation from the original software of my phoscon gateway from .38 to .54 and then the update to .58 an i got once again the lxpanel load to 25 - 30% (using this time a rasp 3b with the phoscon bridge and not the gateway on a different installation) After killing the task the load is down to 5-7% an the lxpanel task is running with 0%

Don´t know if anybody else has these problem but with every installation on different SD Cards i got the same problem and i think it is since the installation of .58 Do you use anything from the lxpanel task @ startup time? after killing it, it seems normal again because of the drop to 5-7% Since it is the same in different installments i think it maybe has to do with the .58 version Maybe i should send you the original SD card with the updatet .58 version from the Phoscon Gateway for inspection ?

manup commented 5 years ago

Do you use anything from the lxpanel task @ startup time?

deCONZ doesn't know nor use anything about lxpanel. If I recall correctly lxpanel it is the taskbar. There must be some internal processing which is repeated really fast and causes the cpu rise.

Maybe i should send you the original SD card with the updatet .58 version from the Phoscon Gateway for inspection ?

A recently created backup file should be enough, if you send it to mpi@dresden-elektronik.de I can have a look and perhaps recreate parts of the setup in my test configuration.

Also if you can provide a log file with the following output, it might help to see what keeps deCONZ so busy:

deCONZ -platform minimal --dbg-info=2 --dbg-aps=1 --dbg-error=1 --http-port=80
manup commented 5 years ago

I can somewhat confirm the issue, my test instance of hass.io with deCONZ 2.05.58 currently burns cpu cycles at 100%.

image

Note this is a very small setup with just one, Ikea remote control. I have an idea what might happen here. Digging deeper ...

manup commented 5 years ago

By the way, can you please provide some details about your devices? Lights, sensors, etc.

jameshawkes commented 5 years ago

For me, its 2 IKEA 5-button remote controls and 4 IKEA motion sensors. No lights connected.

manup commented 5 years ago

I guess the poll manager timer runs non stop like a infinite loop since it has nothing to do :) Will add some small delay, this should fix the issue.

thecem commented 5 years ago

My setup: Tradfri Remote, Tradfri dimmer and 27 xiaomi aqara sensors. No Lights.

manup commented 5 years ago

2.05.59 is available now and should run smooth again

https://github.com/dresden-elektronik/deconz-rest-plugin/releases/tag/V2_05_59

thecem commented 5 years ago

After 24h I confirm the cpu load is back low like in .55.

Thanks a lot!

Ronoon commented 5 years ago

@manup i´ve setup a new Raspi with the latest deconz version and it is running @4-7% as usual So to this problem with the lxpanel now again down to 0% it looks like the configuration of the raspi setup had gone ballistic even with the rebooting etc. in my opinion this issue could be closed and thx again for your help

manup commented 5 years ago

Cool thanks, hope the cpu load stays calm on the new installation.