google-coral / pycoral

Python API for ML inferencing and transfer-learning on Coral devices
https://coral.ai
Apache License 2.0
351 stars 145 forks source link

Dual edge TPU hangs when running detect_image.py #42

Closed truncs closed 2 years ago

truncs commented 3 years ago

classify_image.py runs well and the temprature seems to be stable. But when I run detect_image.py the temprature goes negative and the device is throttled with HIB errors. Please see below for logs

(base) aditya@aditya-desktop:~/workspace/coral/pycoral$ python3.6 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
12.1ms
2.7ms
2.7ms
2.7ms
2.7ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.76953
base) aditya@aditya-desktop:~$ for (( ; ; )); do  sleep 1; cat /sys/class/apex/apex_0/temp; done
43050
43300
43550
43300
43550
43300
43550
43550
43300
43550
43550
43550
43300
43300
43300
43300
43550
43050
43300
43300
43300
43050
43550
43050
43300
43550
43550
43300
43550
43300
43550
43300
43050
43300
43300
43300
43300
43050
43550
43050
43550
43300
43300
43300
43050
43300
43300
43050
43050
43300
43300
43300
43050
43300
43050
43050
43300
43050
43300
43050
43300
43550
43300
43050
43300
43300
43050
43300
43050
43300
43300
43300
43300
43300
43300
43300
(base) aditya@aditya-desktop:~/workspace/coral/pycoral$ python3.6 examples/detect_image.py   --model test_data/ssd_mobilenet_v2_coco_quant_postprocess_edgetpu.tflite   --labels test_data/coco_labels.txt   --input test_data/grace_hopper.bmp   --output ${HOME}/grace_hopper_processed.bmp
----INFERENCE TIME----
Note: The first inference is slow because it includes loading the model into Edge TPU memory.
E driver/mmio_driver.cc:254] HIB Error. hib_error_status = ffffffffffffffff, hib_first_error_status = ffffffffffffffff
(base) aditya@aditya-desktop:~$ for (( ; ; )); do  sleep 1; cat /sys/class/apex/apex_0/temp; done
43300
43550
43800
43800
43800
43550
43550
43800
43550
43800
43800
43050
43300
43300
43300
43550
43550
43300
43550
43550
43550
43550
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
-89700
(base) aditya@aditya-desktop:~$ uname -r
4.15.0-153-generic
hjonnala commented 3 years ago

@truncs sorry for the delay. Are you still facing the issue? Can you please try with other object detentions models and see if Dual Edge TPU hangs or not.

truncs commented 2 years ago

Yeah same problem.

hjonnala commented 2 years ago

@truncs can you please share the software and hardware details.

hjonnala commented 2 years ago

can you please try two model inference.py with dual edge tpu and share the output of below snippet:

hemanth@hemanth-glaptop:~$ python3
Python 3.9.7 (default, Sep  3 2021, 06:18:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pycoral.pybind._pywrap_coral import ListEdgeTpus as list_edge_tpus
>>> list_edge_tpus()
[{'type': 'usb', 'path': '/sys/bus/usb/devices/2-1'}]
>>> 
truncs commented 2 years ago

This is what I see

list_edge_tpus()
[{'type': 'pci', 'path': '/dev/apex_0'}]
manoj7410 commented 2 years ago

@truncs Can you paste the output of the command:

lspci -vvv

truncs commented 2 years ago

The relevant lspci -vvv

/apex                                                                                                                                                                                                                                                                                                                         
...skipping                                                                                                                                                                                                                                                                                                                   
        Kernel driver in use: apex                                                                                                                                                                                                                                                                                            
        Kernel modules: apex
hjonnala commented 2 years ago

do you have any other operating system and/or hardware to test to know whether the issue with software or hardware?

truncs commented 2 years ago

I don't have a hardware with an E key slot, unless I can plug it into a Pi.

hjonnala commented 2 years ago

Okay, can you please try with different operating system as other user faced HIB error issue with windows but it got resolved with ubuntu. https://github.com/google-coral/edgetpu/issues/484

truncs commented 2 years ago

That is interesting since I did test on Ubuntu without any adapters. Do you guys have any recommendation on what adapter to use?

Distributor ID: Ubuntu
Description:    Ubuntu 18.04.6 LTS
Release:        18.04
Codename:       bionic
hjonnala commented 2 years ago

we have tested ASUS Coral Card: https://iot.asus.com/products/AI-accelerator/AI-Accelerator-PCIe-Card/. But, with this card you won't be able to use the dual edge TPU as it likely doesn't provide 2 PCIe buses per M.2 (but a single TPU card would work).

Here is an example for the ASUS PCIe card that uses 8 TPUs in parallel: https://github.com/google-coral/demo-multi-video-stream

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No