ZoneMinder / zmeventnotification

Machine Learning powered Secure Websocket & MQTT based ZoneMinder event notification server
412 stars 128 forks source link

Error in zm_detect.py with assertion lock on es 6.0+? #344

Closed Ranger21 closed 3 years ago

Ranger21 commented 3 years ago

After upgrading to new docker dlandon's image with new zmeventnotification server i took new objectconfig.ini.default and put all my old configs to there (renamed it to objectconfig.ini ofc), but i still get these errors, what they can mean? Detection seems to work fine, but log filled with these fatal errors.

I have 15 monitors all configured to zmeventnotification server

Event Server version 6.0.5 You can get the version by doing: Hooks version (if you are using Object Detection) 0.2.1 The version of ZoneMinder you are using: 1.34.22

Unrecoverable error:Already locked Traceback:Traceback (most recent call last): File "/var/lib/zmeventnotification/bin/zm_detect.py", line 852, in main_handler() File "/var/lib/zmeventnotification/bin/zm_detect.py", line 424, in main_handler b, l, c = m.detect(original_image) File "/usr/local/lib/python3.6/dist-packages/pyzm/ml/object.py", line 54, in detect b,l,c = self.model.detect(image) File "/usr/local/lib/python3.6/dist-packages/pyzm/ml/yolo.py", line 111, in detect self.acquire_lock() File "/usr/local/lib/python3.6/dist-packages/pyzm/ml/yolo.py", line 41, in acquire_lock self.lock.acquire() File "/usr/local/lib/python3.6/dist-packages/portalocker/utils.py", line 318, in acquire assert not self.lock, 'Already locked'AssertionError: Already locked
pliablepixels commented 3 years ago

It means someone else is holding onto a lock. I can't figure out things by one line.

Please post a full debug log (not INF) for a detection process

rabsym commented 3 years ago

I had a similar issue, looks like using one of the lastest pyzm (0.2.1) from master that is work in progress, solved here downgrading pyzm to 0.1.30 version using pip and everything working.

Hope it helps.

pliablepixels commented 3 years ago

Should not happen - I am using master. Can you please post full logs

Ranger21 commented 3 years ago

Reverting pyzm back to 0.1.30 really helps, but do I loose some of functionality? I'll try to reproduce issue with new pyzm and get full debug log

That's weird, i can't to reproduce issue on pyzm 0.2.2 (it has been updated 6 hours ago). is there any point to test 0.2.1?

Third edit: BTW, i've tried to manipulate (ofcourse i've tested first with default values) with cpu_lock timeout and process count setting with new es 6.0 and seems like it wouldn't affect at all this behavior.

What's the recommended value for cpu_max_processes settings? I've now use 1 and timeout 1200. I have 8 cores 16 threads dedicated to VM (using proxmox) witht zoneminder. 15 monitors all with machine learning to sent out messages to telegram on object detection and delete events without objects detected.

pliablepixels commented 3 years ago

If you cannot replicate the issue with 0.2.2 you are good. It may be possible that I did not check in some master code earlier.

What you set for xxx_max_processes depends on the available memory. What it basically means is how many instances of a model run will be allowed at one time on the same processor. The core reason for this is loading a model to memory is expensive. As an example, on my coral TPU I realized trying to load more than 2 (and sometimes even 1) parallel models results in a crash. On my GPU (4GB) it is 1. On my CPU, with 32GB RAM, I have no problem loading 3 together. So you will have to experiment. For example, Yolov4 takes around 2GB(+) to load.

Ranger21 commented 3 years ago

Yeah, i thought that the ram will be limiting factor for loading these models! Thank you. I hope this thread will help someone if they'll find same problem on dlandon's docker image.