eschava / psmqtt

Utility reporting system health and status via MQTT
MIT License
158 stars 35 forks source link

Gradually increasing CPU Load #10

Closed BrianP6 closed 5 years ago

BrianP6 commented 5 years ago

Hi, I'm really pleased with psmqtt, it's really nicely done. However I have a small issue.

I'm running psmqtt on three Raspberry Pis (Original B, Pi 2 and Pi3) communicating with a mosquitto broker on the Pi3. After 24-36 hours (ish) the Original Pi B has gradually increased from a ~1-5% CPU load to 100%. The Pi2 and Pi3 have increased from 1-2% to 20% over the same period.

The Original B isn't running anything else - its a test box to trial stuff and is currently idle (or should be!). The Pi2 is running PiVPN The Pi3 is running openHAB and mosquitto

The only change to these systems has been the introduction of psmqtt. There is nothing relevant in the psmqtt.log file, my psmqtt.conf file is attached psmqtt.conf.txt

Hope you can help, let me know if you need anything. Many thanks, Brian.

eschava commented 5 years ago

Hello

You can try adding couple more lines to your psmqtt config to find the process name and CPU usage by the process that increases CPU load (I hope it isn't psmqtt itself :) ) "every 1 minute": { "processes/top_cpu/name", "processes/top_cpu/cpu_percent" }

BrianP6 commented 5 years ago

I've done that, but it fails when the timer fires for the processes task. I ran it manually and here's the output.

Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "psmqtt.py", line 203, in run self.s.run() File "/usr/lib/python2.7/sched.py", line 117, in run action(*argument) File "psmqtt.py", line 170, in on_timer run_task(tasks, tasks) File "psmqtt.py", line 57, in run_task if task.startswith(topic_prefix): AttributeError: 'set' object has no attribute 'startswith'

Cheers, Brian.

eschava commented 5 years ago

Ah, sorry. You should use [] brackets instead of {} in my sample

BrianP6 commented 5 years ago

Ha! And I looked at that for ages and didn't see the {/[ difference, easy done.

I'll report back later when it reoccurs. Brian.

BrianP6 commented 5 years ago

Hi, So it's been running now for almost 7 hours on the original Raspberry Pi B. The stats that you asked for show 'python' as the top CPU process consuming 0.0 of CPU - I'm a bit dubious of that. The /cpu_percent (global) stat is reporting about 39% CPU usage. I've run htop to get the proper understanding of what is happening and recorded a quick few seconds of "phone video" that is attached. Hope it helps.

As a reference point, killing the process and starting again shows: /cpu_percent stat as less than 1% with the occassional spike to 7% htop shows psmqtt taking 2 or 3% every 5 seconds when it does the cpu_percent stat and the occassional spike to 25% each minute - probably when it's fetching your process stat you requested.

Brian.

VID_20181126_235647.compressed.zip

eschava commented 5 years ago

It really looks confusing. Could I please ask you to test with some minor change? Please replace line 498 of file handlers.py {"get_value": lambda self, total: psutil.cpu_percent(percpu=not total)})('cpu_percent'), with {"get_value": lambda self, total: psutil.cpu_percent(percpu=not total, interval=1)})('cpu_percent'),

Thanks for the video! Also, why do you have two instances of psmqtt.py run?

BrianP6 commented 5 years ago

Thanks, I've changed that line. It doesn't seem to make any difference to the mqtt output, or at least not yet I've only just restarted it (processes/cpu_percent still shows zero). I'm not running two instances, I've attached an image of htop in tree mode and I believe that the green lines are threads, so perhaps you created a thread. Cheers, Brian.

2018-11-27_0942

eschava commented 5 years ago

I see. I found that my instance of psmqtt has similar issue but I have a bigger interval (1 minute) so it isn't so noticeable. Now trying to create a test app to nail it down

eschava commented 5 years ago

Found the reason. It's related to the used recurrent python library Thinking how it could be fixed

eschava commented 5 years ago

Please check now

Note: method dateutil.rrule.rrulebase#after iterates all possible timestamps after the object was created Fix: reparse rule every time when timer event happens

BrianP6 commented 5 years ago

Many thanks, I'll let you know how it goes, it's up and running now.

BrianP6 commented 5 years ago

Looking very good. I've been running for 5 hours and still at sub 1% :-) Thank you very much for all your help and support.