ZoneMinder / zmeventnotification

Machine Learning powered Secure Websocket & MQTT based ZoneMinder event notification server
416 stars 129 forks source link

stop detecting objects and receiving notification - cause multiple dead process opened #437

Open apsilva opened 1 day ago

apsilva commented 1 day ago

Recently i stop receiving events from zm and it was because the server reach maximum memory and linux start to kill process.. mysql was one of them...

The first time when i connect to the server i saw many zmeventnotification.pl process open:

 ps auxf | grep zmeventnotification.pl | wc -l
143

all of them blocked at zm_detect.py:

www-data 2489602  0.0  0.0   2576   880 ?        S    05:10   0:00  \_ sh -c '/var/lib/zmeventnotification/bin/zm_event_start.sh' 366678 7 "AnexoPiscina" "Continuous" "/home/zoneminder/events/7/2024-11-18/366678"
www-data 2489603  0.0  0.0   6932  3260 ?        S    05:10   0:00      \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 366678 7 AnexoPiscina Continuous /home/zoneminder/events/7/2024-11-18/366678
www-data 2489604  0.0  0.0   6932  1692 ?        S    05:10   0:00          \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 366678 7 AnexoPiscina Continuous /home/zoneminder/events/7/2024-11-18/366678
www-data 2489605  0.0  0.6 298016 112440 ?       S    05:10   0:01              \_ /usr/bin/python3 /var/lib/zmeventnotification/bin/zm_detect.py --monitorid 7 --eventid 366678 --config /etc/zm/objectconfig.ini --eventpath /home/zoneminde
www-data 2489606  0.0  0.0   6332  2140 ?        S    05:10   0:00              \_ grep detected:

I found the option to limit the maximum number of parallel_hooks and set it to 4 max_parallel_hooks=4

Now it gets block at 4 process.

www-data   87716  0.0  0.2  56288 42680 ?        S    01:31   0:00 /usr/bin/perl -T /usr/bin/zmeventnotification.pl
www-data   87717  0.0  0.0   2576   948 ?        S    01:31   0:00  \_ sh -c '/var/lib/zmeventnotification/bin/zm_event_start.sh' 367172 1 "entrada" "Continuous" "/home/zoneminder/events/1/2024-11-19/367172"
www-data   87718  0.0  0.0   6932  3328 ?        S    01:31   0:00      \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367172 1 entrada Continuous /home/zoneminder/events/1/2024-11-19/367172
www-data   87719  0.0  0.0   6932  1548 ?        S    01:31   0:00          \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367172 1 entrada Continuous /home/zoneminder/events/1/2024-11-19/367172
www-data   87720  0.0  0.6 298012 112748 ?       S    01:31   0:01              \_ /usr/bin/python3 /var/lib/zmeventnotification/bin/zm_detect.py --monitorid 1 --eventid 367172 --config /etc/zm/objectconfig.ini --eventpath /home/zoneminde
www-data   87721  0.0  0.0   6332  2096 ?        S    01:31   0:00              \_ grep detected:
www-data   87988  0.0  0.2  56288 42680 ?        S    01:34   0:00 /usr/bin/perl -T /usr/bin/zmeventnotification.pl
www-data   87989  0.0  0.0   2576   956 ?        S    01:34   0:00  \_ sh -c '/var/lib/zmeventnotification/bin/zm_event_start.sh' 367173 6 "quartos" "Continuous" "/home/zoneminder/events/6/2024-11-19/367173"
www-data   87990  0.0  0.0   6932  3336 ?        S    01:34   0:00      \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367173 6 quartos Continuous /home/zoneminder/events/6/2024-11-19/367173
www-data   87991  0.0  0.0   6932  1576 ?        S    01:34   0:00          \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367173 6 quartos Continuous /home/zoneminder/events/6/2024-11-19/367173
www-data   87992  0.0  0.6 298012 112636 ?       S    01:34   0:01              \_ /usr/bin/python3 /var/lib/zmeventnotification/bin/zm_detect.py --monitorid 6 --eventid 367173 --config /etc/zm/objectconfig.ini --eventpath /home/zoneminde
www-data   87993  0.0  0.0   6332  2196 ?        S    01:34   0:00              \_ grep detected:
www-data   88469  0.0  0.2  56288 42680 ?        S    01:40   0:00 /usr/bin/perl -T /usr/bin/zmeventnotification.pl
www-data   88470  0.0  0.0   2576   916 ?        S    01:40   0:00  \_ sh -c '/var/lib/zmeventnotification/bin/zm_event_start.sh' 367175 3 "garagem" "Continuous" "/home/zoneminder/events/3/2024-11-19/367175"
www-data   88471  0.0  0.0   6932  3236 ?        S    01:40   0:00      \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367175 3 garagem Continuous /home/zoneminder/events/3/2024-11-19/367175
www-data   88472  0.0  0.0   6932  1584 ?        S    01:40   0:00          \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367175 3 garagem Continuous /home/zoneminder/events/3/2024-11-19/367175
www-data   88473  0.0  0.6 298024 112492 ?       S    01:40   0:01              \_ /usr/bin/python3 /var/lib/zmeventnotification/bin/zm_detect.py --monitorid 3 --eventid 367175 --config /etc/zm/objectconfig.ini --eventpath /home/zoneminde
www-data   88474  0.0  0.0   6332  2148 ?        S    01:40   0:00              \_ grep detected:
www-data   88476  0.0  0.2  56288 42680 ?        S    01:40   0:00 /usr/bin/perl -T /usr/bin/zmeventnotification.pl
www-data   88477  0.0  0.0   2576   912 ?        S    01:40   0:00  \_ sh -c '/var/lib/zmeventnotification/bin/zm_event_start.sh' 367174 7 "AnexoPiscina" "Continuous" "/home/zoneminder/events/7/2024-11-19/367174"
www-data   88478  0.0  0.0   6932  3304 ?        S    01:40   0:00      \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367174 7 AnexoPiscina Continuous /home/zoneminder/events/7/2024-11-19/367174
www-data   88479  0.0  0.0   6932  1484 ?        S    01:40   0:00          \_ /bin/bash /var/lib/zmeventnotification/bin/zm_event_start.sh 367174 7 AnexoPiscina Continuous /home/zoneminder/events/7/2024-11-19/367174
www-data   88480  0.0  0.6 298024 113248 ?       S    01:40   0:01              \_ /usr/bin/python3 /var/lib/zmeventnotification/bin/zm_detect.py --monitorid 7 --eventid 367174 --config /etc/zm/objectconfig.ini --eventpath /home/zoneminde
www-data   88481  0.0  0.0   6332  2188 ?        S    01:40   0:00              \_ grep detected:

that is correct and in zmeventnotification log we can see the cause and the parameter is preventing the system to fork more process:

11/19/2024 01:40:43.750227 zmeventnotification[31559].INF [main:1406] [New event 367176 reported for Monitor:1 (Name:entrada) Continuous[last processed eid:367172]]
11/19/2024 01:40:43.761592 zmeventnotification[31559].ERR [main:3800] [There are 4 hooks running as of now. This exceeds your set limit of max_parallel_hooks=4. Ignoring this event. Either increase your max_parallel_hooks value, or, adjust your ZM motion sensitivity ]

That solve the issue about reaching the maximum of memory.. but i still got the initial problem, no objects detected...

Not sure if is the correct solution, but I made this change to set a maximum timeout to detect the object, using the command timeout to 30s for zm_detect.py to return a result:

--- hook/zm_event_start.sh  2024-11-19 09:01:45.434287634 +0000
+++ /var/lib/zmeventnotification/bin/zm_event_start.sh  2024-11-19 09:01:57.770297905 +0000
@@ -32,9 +32,9 @@
 # use arrays instead of strings to avoid quote hell
 if [[ ! -z "${2}" ]]
 then 
-   DETECTION_SCRIPT=(/var/lib/zmeventnotification/bin/zm_detect.py --monitorid $2 --eventid $1 --config "${CONFIG_FILE}" --eventpath "${EVENT_PATH}" --reason "${REASON}"  )
+   DETECTION_SCRIPT=(timeout 30s /var/lib/zmeventnotification/bin/zm_detect.py --monitorid $2 --eventid $1 --config "${CONFIG_FILE}" --eventpath "${EVENT_PATH}" --reason "${REASON}"  )
 else
-   DETECTION_SCRIPT=(/var/lib/zmeventnotification/bin/zm_detect.py  --eventid $1 --config "${CONFIG_FILE}" --eventpath "${EVENT_PATH}" --reason "${REASON}"  )
+   DETECTION_SCRIPT=(timeout 30s /var/lib/zmeventnotification/bin/zm_detect.py  --eventid $1 --config "${CONFIG_FILE}" --eventpath "${EVENT_PATH}" --reason "${REASON}"  )

 fi
 RESULTS=$("${DETECTION_SCRIPT[@]}" | grep "detected:") 

I'm using MLAPI on a virtual env python3.9.18 , but the issue doesn't look related to the mlapi.

INFO about my setup:

root@sala:~# sudo -u www-data /usr/bin/zmeventnotification.pl --version
6.1.29

root@sala:~# sudo -u www-data /var/lib/zmeventnotification/bin/zm_detect.py --version
app:6.1.29, pyzm:0.3.62

root@sala:~# /home/zoneminder/virtualenv/mlapi/mlapi.py --version
2.2.24

How can i get more verbose to solve the root cause of the problem?

Thanks.

connortechnology commented 21 hours ago

Sounds like you just don't have enough cpu or gpu to do the ML. You may have to consider getting a gpu or tpu. Turn on debugging in zm and there should be lots of logs in /var/log/zm/zmesdetect*.log.

apsilva commented 20 hours ago

The cpu is around 25% usage, I've installed a coral usb TPU working in the mlapi. The object detection works fine after restarting everything, i notice that it stop working the next day, i don't see the notifications and what i could see so far is the stuck tasks.

I got more log search for each event in the logs:

/var/log/zm/web_php.log:Command was: /usr/bin/ffmpeg -ss 52.98 -i /home/zoneminder/events/1/2024-11-19/367172/incomplete.mp4 -frames:v 1 /home/zoneminder/events/1/2024-11-19/367172/00213-capture.jpg 2>&1
/var/log/zm/web_php.log:Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/home/zoneminder/events/1/2024-11-19/367172/incomplete.mp4':
/var/log/zm/web_php.log:Output #0, image2, to '/home/zoneminder/events/1/2024-11-19/367172/00213-capture.jpg':
/var/log/zm/web_php.log:Command was: /usr/bin/ffmpeg -ss 52.98 -i /home/zoneminder/events/1/2024-11-19/367172/incomplete.mp4 -frames:v 1 /home/zoneminder/events/1/2024-11-19/367172/00213-capture.jpg 2>&1
/var/log/zm/web_php.log:Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/home/zoneminder/events/1/2024-11-19/367172/incomplete.mp4':
/var/log/zm/web_php.log:Output #0, image2, to '/home/zoneminder/events/1/2024-11-19/367172/00213-capture.jpg':
/var/log/zm/web_php.log:Command was: /usr/bin/ffmpeg -ss 52.98 -i /home/zoneminder/events/1/2024-11-19/367172/incomplete.mp4 -frames:v 1 /home/zoneminder/events/1/2024-11-19/367172/00213-capture.jpg 2>&1
/var/log/zm/web_php.log:Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/home/zoneminder/events/1/2024-11-19/367172/incomplete.mp4':
/var/log/zm/web_php.log:Output #0, image2, to '/home/zoneminder/events/1/2024-11-19/367172/00213-capture.jpg':
/var/log/zm/zmc_m1.log:11/19/24 01:30:31.812812 zmc_m1[80410].INF-zm_monitor.cpp/2369 [Opened new event 367172 Continuous]
/var/log/zm/zmc_m1.log:11/19/24 01:40:31.460827 zmc_m1[80410].INF-zm_monitor.cpp/2321 [entrada: 24001 - Closing event 367172, section end forced 1731980431 - 1731979831 = 600 >= 600]
/var/log/zm/zmeventnotification.log:11/19/2024 01:31:23.745851 zmeventnotification[31559].INF [main:1406] [New event 367172 reported for Monitor:1 (Name:entrada) Continuous[last processed eid:367168]]
/var/log/zm/zmeventnotification.log:11/19/2024 01:40:43.750227 zmeventnotification[31559].INF [main:1406] [New event 367176 reported for Monitor:1 (Name:entrada) Continuous[last processed eid:367172]]
/var/log/zm/zmeventnotification.log:11/19/2024 08:47:30.850736 zmeventnotification[87716].INF [main:3693] [Event 367172 for Monitor 1 has finished]

/var/log/zm/zmc_m6.log:11/19/24 01:34:24.434337 zmc_m6[80751].INF-zm_monitor.cpp/2369 [Opened new event 367173 Continuous]
/var/log/zm/zmc_m6.log:11/19/24 01:44:24.203036 zmc_m6[80751].INF-zm_monitor.cpp/2321 [quartos: -01 - Closing event 367173, section end forced 1731980664 - 1731980064 = 600 >= 600]
/var/log/zm/zmeventnotification.log:11/19/2024 01:34:53.746111 zmeventnotification[31559].INF [main:1406] [New event 367173 reported for Monitor:6 (Name:quartos) Continuous[last processed eid:367169]]
/var/log/zm/zmeventnotification.log:11/19/2024 01:44:33.747037 zmeventnotification[31559].INF [main:1406] [New event 367177 reported for Monitor:6 (Name:quartos) Continuous[last processed eid:367173]]
/var/log/zm/zmeventnotification.log:11/19/2024 08:47:30.854259 zmeventnotification[87988].INF [main:3693] [Event 367173 for Monitor 6 has finished]

/var/log/zm/zmc_m3.log:11/19/24 01:40:30.080311 zmc_m3[80369].INF-zm_monitor.cpp/2369 [Opened new event 367175 Continuous]
/var/log/zm/zmc_m3.log:11/19/24 01:50:30.063981 zmc_m3[80369].INF-zm_monitor.cpp/2321 [garagem: 79352 - Closing event 367175, section end forced 1731981030 - 1731980430 = 600 >= 600]
/var/log/zm/zmeventnotification.log:11/19/2024 01:40:31.745927 zmeventnotification[31559].INF [main:1406] [New event 367175 reported for Monitor:3 (Name:garagem) Continuous[last processed eid:367171]]
/var/log/zm/zmeventnotification.log:11/19/2024 01:50:33.746573 zmeventnotification[31559].INF [main:1406] [New event 367179 reported for Monitor:3 (Name:garagem) Continuous[last processed eid:367175]]
/var/log/zm/zmeventnotification.log:11/19/2024 08:47:30.848438 zmeventnotification[88469].INF [main:3693] [Event 367175 for Monitor 3 has finished]

/var/log/zm/zmc_m7.log:11/19/24 01:40:14.148824 zmc_m7[80342].INF-zm_monitor.cpp/2369 [Opened new event 367174 Continuous]
/var/log/zm/zmc_m7.log:11/19/24 01:50:14.361763 zmc_m7[80342].INF-zm_monitor.cpp/2321 [AnexoPiscina: 132013 - Closing event 367174, section end forced 1731981014 - 1731980414 = 600 >= 600]
/var/log/zm/zmeventnotification.log:11/19/2024 01:40:41.745792 zmeventnotification[31559].INF [main:1406] [New event 367174 reported for Monitor:7 (Name:AnexoPiscina) Continuous[last processed eid:367170]]
/var/log/zm/zmeventnotification.log:11/19/2024 01:50:39.746055 zmeventnotification[31559].INF [main:1406] [New event 367178 reported for Monitor:7 (Name:AnexoPiscina) Continuous[last processed eid:367174]]
/var/log/zm/zmeventnotification.log:11/19/2024 08:47:30.854244 zmeventnotification[88476].INF [main:3693] [Event 367174 for Monitor 7 has finished]

I'm gonna to see how it behaves with the timeout change.

apsilva commented 2 hours ago

didn't solve the issue... my system stopped detecting objects but no locked process.

Before restarting all the services i manually try to run the detection with debug on a event and i had to kill the process, it was stuck on:

11/20/24 08:42:30 zmesdetect[239435] DBG1 zm_detect.py:71 [Invoking remote API login]

and in mlapi logs... nothing..

full log:

root@sala:~# /var/lib/zmeventnotification/bin/zm_detect.py --config /etc/zm/objectconfig.ini  --eventid 367904 --debug
11/20/24 08:42:29 zmesdetect[239435] INF ZMLog.py:292 [Setting up signal handler for logs]

11/20/24 08:42:29 zmesdetect[239435] INF ZMLog.py:301 [Switching global logger to ZMLog]

11/20/24 08:42:29 zmesdetect[239435] DBG1 zm_detect.py:269 [---------| app:6.1.29, pyzm:0.3.62, ES:(?) , OpenCV:4.8.0|------------]

11/20/24 08:42:29 zmesdetect[239435] INF utils.py:404 [Reading config from: /etc/zm/objectconfig.ini]

11/20/24 08:42:29 zmesdetect[239435] INF utils.py:409 [Reading secrets from: /etc/zm/secrets.ini]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:381 [Secret token found in config: !ZM_PORTAL]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:381 [Secret token found in config: !ZM_USER]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:381 [Secret token found in config: !ZM_PASSWORD]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:381 [Secret token found in config: !ZM_API_PORTAL]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:381 [Secret token found in config: !ML_USER]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:381 [Secret token found in config: !ML_PASSWORD]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:381 [Secret token found in config: !PLATEREC_ALPR_KEY]

11/20/24 08:42:29 zmesdetect[239435] DBG1 utils.py:444 [allowing self-signed certs to work...]

11/20/24 08:42:29 zmesdetect[239435] DBG2 utils.py:454 [Now checking for monitor overrides]

11/20/24 08:42:29 zmesdetect[239435] INF utils.py:509 [Ignoring monitor specific settings, as you did not provide a monitor id]

11/20/24 08:42:29 zmesdetect[239435] DBG3 utils.py:521 [Finally, doing parameter substitution]

11/20/24 08:42:29 zmesdetect[239435] DBG1 zm_detect.py:306 [Importing remote shim classes for Object/Face]

11/20/24 08:42:29 zmesdetect[239435] DBG2 api.py:69 [API SSL certificate check has been disbled]

11/20/24 08:42:29 zmesdetect[239435] DBG1 api.py:173 [using username/password for login]

11/20/24 08:42:30 zmesdetect[239435] DBG2 api.py:202 [Using new token API]

11/20/24 08:42:30 zmesdetect[239435] DBG1 api.py:209 [Access token expires on:2024-11-20 10:42:30.169840 [7200s]]

11/20/24 08:42:30 zmesdetect[239435] DBG1 api.py:213 [Refresh token expires on:2024-11-21 08:42:30.172668 [86400s]]

11/20/24 08:42:30 zmesdetect[239435] DBG2 zm_detect.py:333 [using ml_sequence]

11/20/24 08:42:30 zmesdetect[239435] DBG2 zm_detect.py:345 [using stream_sequence]

11/20/24 08:42:30 zmesdetect[239435] DBG1 zm_detect.py:40 [Detecting using remote API Gateway http://192.168.1.2:5000/api/v1]

11/20/24 08:42:30 zmesdetect[239435] DBG2 zm_detect.py:49 [Found token file, checking if token has not expired]

11/20/24 08:42:30 zmesdetect[239435] DBG1 zm_detect.py:65 [Found access token, but it has expired (or is about to expire)]

11/20/24 08:42:30 zmesdetect[239435] DBG1 zm_detect.py:71 [Invoking remote API login]

^C11/20/24 08:43:45 zmesdetect[239435] INF ZMLog.py:318 [Got interrupt, exiting]

After restarting the mlapi service everything is working ok. The mlapi.log was empty before the restart.. i suspect that is the log rotate that is causing the issue, what you think?