Odianosen25 / Monitor-App

Appdaemon App for Andrew's Monitor Presence Detection System
74 stars 9 forks source link

AppDaemon starvation #45

Closed bemilicus closed 4 years ago

bemilicus commented 4 years ago

Hello i'm in the phase of testing the app now.

Simply added the app to AppDaemon with very basic configuration (2 monitor nodes). After a while (more or less 1 day), the app seems to crash and cause AD starvation. Below a portion of the logs and my conf. This is the only appdaemon app running on my system at the moment.

Thank you

appdaemon_logs.txt config.txt

Odianosen25 commented 4 years ago

Hello @bemilicus,

Hmm that’s an interesting error message, never came across it before. This doesn’t seem like the whole log. It will be nice if I can get the error logs so can determine where it all began. Can I have it pls?

Regards

bemilicus commented 4 years ago

Hello @Odianosen25 the logs i've attached seems to cycle again and again - hence i've cutted it at a certain point ;-). The app seems to hang ot crash every day with no apparent logic. If it can be of some help two more details: 1) every day, i put my phone in plane mode before go to bed 2) during these tests, i shutdown one of the two monitor node (the night one - so not the main node)

Also i've found further logs (error logs) that i attach here: presence_app_errors.txt

hope this help

Bye!

Odianosen25 commented 4 years ago

Hello @bemilicus,

Just pushed some updated to dev, please try and and let me know if it fixes your issue.

Thanks

bemilicus commented 4 years ago

Thank you i'll keep you updated in few days!

Regards

bemilicus commented 4 years ago

Hello Odianosen25, after few days of testing, the app seems to work fine now, so your fix solved the issue (at least i think from what i see in the logs ;-)

Thank you very much!

Regards

Odianosen25 commented 4 years ago

Oh that’s great to hear @bemilicus and thanks for reporting back. Will close does now and reopen if need be.

Regards

bemilicus commented 4 years ago

Hello @Odianosen25, unfortunately it seems i spoke too soon... ;-( Just the day after i wrote, the app went to starvation again. I attach the logs - let me know if you need further infos. I'm on the latest hassos rpi image (32bit) with latest hassio and latest ad.

logs.txt

Thank you

Odianosen25 commented 4 years ago

Hmmm @bemilicus there must be some bad ass issue for it to be killing the thread, can I have the error logs pls?

Thanks

bemilicus commented 4 years ago

Yes! sure! just created the file here below... ;-) i truncated the file at some point since there were errors from 08th april...

err_logs.txt

Regards

Odianosen25 commented 4 years ago

@bemilicus, please try the latest update.

The issue I found in some of your error logs, it seems like the app didn't start properly in the first place. So when you update, pls keep a good eye at any errors that might come up at startup.

Regards

bemilicus commented 4 years ago

Thank you @Odianosen25 i'll keep you informed should any error come up.

Regards

bemilicus commented 4 years ago

Hello @Odianosen25 i've updated the plugin and made hassio restart. Attached the logs after starting the app. Let's see for some days. If you catch something strange in the logs let me know what can i do my side to further debug the issue ;-)

Thanks

Regards err_logs.txt

bemilicus commented 4 years ago

Hello @Odianosen25 after a couple of days, the app stopped working, and the errors (at least to me) seem to be almost the same. Anyway i've attached the last 2 days logs. Hope to hear you back ;-)

Thank you

err_logs#3.txt

Odianosen25 commented 4 years ago

@bemilicus,

Are you using the latest dev? The errors seem to be as a result of the app trying to restart one of your nodes, and it holds up the system.

I did put in a fix for this, and thought it should have been fixed. Kindly confirm you definitely having the latest fix, and do let me know.

Regards

bemilicus commented 4 years ago

Hello @Odianosen25, hoping is all well.

In the past days, i turned off the remote nodes reboot option in your plug-in and all worked well. I need further testing with remote reboot option to confirm eventually the fix of the issue. In the meantime, with reboot option turned off i catch these further errors in the logs, which i attach here if it can be of some help.

Thanks error_logs_4.txt

Odianosen25 commented 4 years ago

Hello @bemilicus ,

Yes we are fine here and hope you better? Been waiting on you to confirm these errors are fixed before next release. I have looked at the errors, and this is not an error with the app, but there seems to be some connection issue between where AD is installed and your MQTT broker.

These errors show a timeout issue, which means when the app tried to send a message sometimes, it takes more than 10 seconds and AD timesout due to that. Not something I can fix I am afraid, and I think your AD and MQTT broker are on different systems. If that is the case, do see into the network connection between both of them.

That could also explain why the reboot option was affecting you, you might have a network issue, and it messed up other aspects of the app.

Regards

bemilicus commented 4 years ago

Hello @Odianosen25, after some days of quite running of the app, yesterday i got another (i think) silimar error. AD went unstable and the hassio system it runnings on completely unresponsive.

I checked MQTT and AD connections (i'm on the same rpi3 system with the twos), and all seems fine since i was getting regular updates on the presence of the persons in a matter o 1-2 seconds when all is ok and the plugin is running.

Regarding the ssh connection errors you can see in the logs, these are expected since i normally turn one of the rpi0 (nodes) off during night. I don't know if that could be the root cause of the traces thought... ;-)

Please tell me if there is something you need to further test my side.

Attached the logs

Thks errors_5.txt

Odianosen25 commented 4 years ago

@bemilicus, this issue still boils down to the fact the connection to the broker gets lost sometimes. When it does, it breaks AD's internal.

Regards

bemilicus commented 4 years ago

Hello @Odianosen25, hope all is well... After some weeks of quite running, I'm back with some further error messages - from the plugin. After the error, the rpi3 it running onto, goes to high load and stays there indefinitely. Attached the error.

Let me know it there are something wrong again my side. Thanks!

errors_6.txt

Odianosen25 commented 4 years ago

@bemilicus, yeah I am good and you?

When it gets into high load, does stopping the app by disabling it help with it?

Regards

bemilicus commented 4 years ago

@Odianosen25 - thks all ok here too...

Well when i've noticed the problem, i just treid to stop AD add-on (and hence the plugin). The load passes from 3-4 to about 1.5. By load i mean the 1min 5min 15min from the top command on my rpi. Also noticed that hassio in general, starting from 0.110 had a huge increase of load on my rpi compared with earlier versions. Anyway that's probably another kind of issue.

Let me know

TIA

Odianosen25 commented 4 years ago

Will be closing this issue, since I guess your issue is fixed now. Re-open if any extra help required