ZoneMinder / zmeventnotification

Machine Learning powered Secure Websocket & MQTT based ZoneMinder event notification server
412 stars 128 forks source link

Coral Edge TPU - HandleQueuedBulkIn transfer in failed. Not found: USB transfer error 5 [LibUsbDataInCallback] #302

Closed mercurytoxic closed 4 years ago

mercurytoxic commented 4 years ago

Event Server version

v5.15.6.r57.gfc6d2b9

Hooks version

sudo -u http python3 -c "import pyzm; print (pyzm.ml.__version__)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'pyzm' has no attribute 'ml'

sudo -u http python3 -c "import pyzm; print (pyzm.__version__)"
0.1.25

The version of ZoneMinder you are using:

v1.35.5

What is the nature of your issue

Bug

Details

When I try to send two object detections at the same time while using the Coral Edge TPU, I get a correct response from one and a F :1147] HandleQueuedBulkIn transfer in failed. Not found: USB transfer error 5 [LibUsbDataInCallback] from the other.

This doesn't happen when I use YoloV4 with a GPU.

Here they mention that is due to multiple processing attempting to access the tpu at the same time.

The problem that this is causing is that when different cameras are triggered at the same time, not all of the cameras are processed by the detector.

To reproduce this I just run

sudo -u http /var/lib/zmeventnotification/bin/zm_detect.py --config /etc/zm/objectconfig.ini  --eventid <eid1> --monitorid <mid1> & sudo -u http /var/lib/zmeventnotification/bin/zm_detect.py --config /etc/zm/objectconfig.ini  --eventid <eid2> --monitorid <mid2>

Result:

F :1147] HandleQueuedBulkIn transfer in failed. Not found: USB transfer error 5 [LibUsbDataInCallback]
[a] detected:car:58% --SPLIT--[{"type": "object", "label": "car", "box": [547, 82, 600, 149], "confidence": "58.20%"}]
[1]+  Aborted                 sudo -u http /var/lib/zmeventnotification/bin/zm_detect.py --config /etc/zm/objectconfig.ini --eventid <eid1> --monitorid <mid1>

Thanks!

pliablepixels commented 4 years ago

Please post full debug logs. There are TPU locks that should prevent this. Make sure your tpu lock max is 1

pliablepixels commented 4 years ago

Also you are using old versions. If you are using edge tpu you need to be on 6.0

mercurytoxic commented 4 years ago

Thanks for the quick reply, updating fixed the issue.

aslanpour commented 2 years ago

I kind of have the same issue. Container A is connected to the TPU and everything is okay. Once container B connects to TPU (loading the same object detection model to TPY), container A fails. I need to know where I can catch the error so it does not fail my container.

baudneo commented 2 years ago

Are container a and container b both active and requesting access to the tpu at the same time?

Also is this mlapi or zmes? As mlapi will keep the tpu bound and a model loaded in memory.

aslanpour commented 2 years ago

Yes, both containers request access to the TPU once they are up. So, container A has access already and sets an interpreter variable, but once container B attempts to access to set its interpreter variable, it gets access but container A fails with this error:

2022/08/17 23:49:51 stderr: F driver/usb/usb_driver.cc:1148] HandleQueuedBulkIn transfer in failed. Not found: USB transfer error 5 [LibUsbDataInCallback]
2022/08/17 23:49:51 Forked function has terminated: signal: aborted (core dumped)

Neither of mlapi or zmes. I am just running my application as a python code in a Flask framework.

baudneo commented 2 years ago

That is expected behavior I would think, my advice would be to make a container that is dedicated to the tpu and have the other containers make http requests to the dedicated container for detections.