ZoneMinder / zmeventnotification

Machine Learning powered Secure Websocket & MQTT based ZoneMinder event notification server
412 stars 128 forks source link

Many zm_detect.py processes crash the computer #384

Closed aaronsta1 closed 3 years ago

aaronsta1 commented 3 years ago

Event Server version

You can get the version by doing:

/usr/bin/zmeventnotification.pl --version

this returns an error.

Hooks version (if you are using Object Detection)

You can get the version by doing:

/var/lib/zmeventnotification/bin/zm_detect.py --version

app:6.1.18, pyzm:0.3.41

Are you using MLAPI? (Y/N)

NO

If you are using MLAPI, please mention the version You can get the version by doing: (change path to wherever you install mlapi)

/var/lib/zmeventnotification/mlapi/mlapi.py --version

The version of ZoneMinder you are using:

1.34.23

What is the nature of your issue

it started raining and when it rains the camera detects movement because of the way ZM detects movement you cant really do anything about it..

this causes ZME to run zm_detect.py 100s of times.. running the PC out of memory until it crashes. i have it set in the ini to only use 1 cpu and 1 gpu.. but it still runs multiple instances.

not sure about how to fix this.

pliablepixels commented 3 years ago

/usr/bin/zmeventnotification.pl --version this returns an error.

Fixed the template. you need to do sudo -u www-data /usr/bin/zmeventnotification.pl --version before. Nevermind for this issue.

Question: Your title says "zm_detect.py needs a lock so it only runs once then waits for it to finish"

And in the issue report you say

this causes ZME to run zm_detect.py 100s of times.. running the PC out of memory until it crashes.

So what is happening as you see in the logs? a) Every rain triggers an event that triggers zm_detect b) The first zm_detect runs (grabs locks) while all the other zm_detects are waiting for locks? c) Your computer crashes due to the number of zm_detect processes running?

I'm trying to figure out the core issue a) Does your computer crash because there are 100s of processes of zm_detect running (waiting to grab locks) (can happen) b) Or does it crash because multiple of them are able to grab locks and are trying to load ML libraries ? (should not happen)

aaronsta1 commented 3 years ago

im not 100% sure i never had this problem before, i will try to recreate it. my zm computer was running really slow and ZM was saying too many connections when trying to access the web interface. the DB was maxed at 151 users out of 151.

i SSH'd in and ran glances and there were 100s of zm_detect.py and zmeventnotification.pl running and my ram was all use up to 100%.. the PC has 24gb of ram. usually its only sitting at around 30% used and the CPU was at 100%. its a 12 core xeon. wish i took a screen shot.

i forced rebooted it. all seems well now.

i did just update to the newest version a few days ago. it hasn't rained in awhile so its something that hasn't came up but it was fine before, but it was an older version. i think its supposed to rain all weekend so ill see if it does it again and find the logs.

pliablepixels commented 3 years ago

I've added a control measure in zmeventnotification.ini

# When a hook is invoked, the ES forks a child. If you are in a situation
# where your motion sensititivy in ZM is not set properly, you may land up
# triggering hundreds of child processes of zm_detect that may potentially
# crash your system. Note that there are global locks around the ML code which
# are controlled by xxx_max_processes in the objectconfig/mlapiconfig.files
# which will avoid parallel running of models. But this is if you are facing issues
# by the simple fact that too many zm_detect processes are forked (which will apply
# whether you use mlapi or not). While I do feel the core issue is that you need 
# to fix your ZM sensitivity, this parameter helps control.

# NOTE: When you put in value for this, any hooks that attempt to kick off 
# beyond this limit will simply be ignored. There is no queueing.

# A value of 0 (default) means there are no limits
max_parallel_hooks=0
aaronsta1 commented 3 years ago

I've added a control measure in zmeventnotification.ini

# When a hook is invoked, the ES forks a child. If you are in a situation
# where your motion sensititivy in ZM is not set properly, you may land up
# triggering hundreds of child processes of zm_detect that may potentially
# crash your system. Note that there are global locks around the ML code which
# are controlled by xxx_max_processes in the objectconfig/mlapiconfig.files
# which will avoid parallel running of models. But this is if you are facing issues
# by the simple fact that too many zm_detect processes are forked (which will apply
# whether you use mlapi or not). While I do feel the core issue is that you need 
# to fix your ZM sensitivity, this parameter helps control.

# NOTE: When you put in value for this, any hooks that attempt to kick off 
# beyond this limit will simply be ignored. There is no queueing.

# A value of 0 (default) means there are no limits
max_parallel_hooks=0

this is what i was looking for. i updated to the new master and i set the max parallel to 10.. ill see if i have anymore issues. thanks :)

aaronsta1 commented 3 years ago

im not sure if its this change or something you did in the new master but i lost the ability to make a min value for zmninja push messages. i have the events for each monitor set to 300 seconds, but im getting back to back push messages. in the logs it used to say cannot send fcm because x is before 300 seconds, it doesnt say this anymore.

aaronsta1 commented 3 years ago

ok after some tests its not updating the token. this must be something else.. this is what my token.txt says {"tokens":{}}

even tho zmninja is working? ill check into more.

ugh i have to make a new post but i dunno if i should do it here or on the zmninja github.. its broken. something changed in the new master.. the tokens.txt file is getting updated every time i restart zmninja i can tell by the time stamp, but its blank. the reason why its still working is because the old key was cached.. i cleared it and now im not getting push messages at all.

the logs are saying the old key has 86400 seconds left.. is there a way to make zmninja force a new key? i clear the api cache and restart but its still finding a cached key.

Apr 15, 2021 03:07:15:863 PM DEBUG push: setting up onMessageReceived... Apr 15, 2021 03:07:15:869 PM DEBUG push: Channel created: zmninja Apr 15, 2021 03:07:16:172 PM DEBUG CACHE: found for key: cached_timezone with expiry of:86400s

aaronsta1 commented 3 years ago

i followed your faq and its something with the WSS service. WS works.. WSS says exception to websocket connection closed. i see in the faq i can update the websocket, ill try that.

its working now but im still getting this error and im not sure what it means Apr 15, 2021 03:49:29:591 PM DEBUG EventSever: Failed to connect to WebSocket: code: 1006, reason: undefined, exception: Connection reset i guess ill make a post about it in the zmninja github.