Closed noamholz closed 3 years ago
Hello @noamholz Could you please share the link to published benchmarks?
with USB accelerator inference speeds might differ based on your host system and whether you're using USB 2.0 or 3.0.
Here rare the results on my Linux Machine:
With libedgetpu1-max
python3 examples/detect_image.py --model /home/Desktop/issues/pycoral_46/efficientdet_lite2_448_ptq_edgetpu.tflite --labels test_data/coco_labels.txt --input test_data/grace_hopper.bmp --output ${HOME}/grace_hopper_processed.bmp
----INFERENCE TIME----
Note: The first inference is slow because it includes loading the model into Edge TPU memory.
128.92 ms
101.32 ms
95.12 ms
95.44 ms
107.40 ms
-------RESULTS--------
person
id: 0
score: 0.9609375
bbox: BBox(xmin=3, ymin=22, xmax=510, ymax=600)
with libedgetpu1-std
python3 examples/detect_image.py --model /home/Desktop/issues/pycoral_46/efficientdet_lite2_448_ptq_edgetpu.tflite --labels test_data/coco_labels.txt --input test_data/grace_hopper.bmp --output ${HOME}/grace_hopper_processed.bmp
----INFERENCE TIME----
Note: The first inference is slow because it includes loading the model into Edge TPU memory.
147.90 ms
128.71 ms
120.05 ms
120.40 ms
127.54 ms
-------RESULTS--------
person
id: 0
score: 0.9609375
bbox: BBox(xmin=3, ymin=22, xmax=510, ymax=600)
Thanks @hjonnala for your reply,
USB accelerator inference speeds might differ based on your host system and whether you're using USB 2.0 or 3.0
My system is indeed using USB 3.0, and the CPUs: Dual Cortex-A72 + Quad Cortex-A53.
Here's how I relate my case to the published benchmark - I expect my system to be similar enough to the Coral dev-board, since they seem to have similar computing power. But while the dev-board is on par with "Desktop CPU + USB Accelerator" (2.6ms inference time on mobilenet_v2, for example), my system is much slower (5.5ms inference time on mobilenet_v2).
An important difference between my system & the dev-board, is that the latter uses a PCI connection - would that explain most of the difference in performance? Would you recommend any tests that I perform to identify the bottleneck?
Thanks again!
Operating frequency also affects the inference time. Can you try installing edgetpu runtime with maximum operating frequency? (sudo apt-get install libedgetpu1-max
)
On my Linux machine(x_86_64), with standard operating frequency
******************** Check results *********************
* Unexpected high latency! [inception_v1_224_quant_edgetpu.tflite]
Inference time: 5.300638125045225 ms Reference time: 3.06 ms
* Unexpected high latency! [mobilenet_v1_1.0_224_quant_edgetpu.tflite]
Inference time: 3.9057857799343765 ms Reference time: 2.17 ms
* Unexpected high latency! [mobilenet_v2_1.0_224_quant_edgetpu.tflite]
Inference time: 4.219176759943366 ms Reference time: 2.29 ms
* Unexpected high latency! [ssd_mobilenet_v2_face_quant_postprocess_edgetpu.tflite]
Inference time: 8.32953658507904 ms Reference time: 5.36 ms
******************** Check finished! *******************
with maximum operating frequency
******************** Check results *********************
* Unexpected low latency! [ssd_mobilenet_v1_coco_quant_postprocess_edgetpu.tflite]
Inference time: 6.677852355060168 ms Reference time: 10.02 ms
******************** Check finished! *******************
Thanks @hayatoy. Using libedgetpu1-max certainly improves the speeds.
******************** Check results *********************
* Unexpected high latency! [mobilenet_v1_1.0_224_quant_edgetpu.tflite]
Inference time: 3.5727628600000116 ms Reference time: 2.22 ms
* Unexpected high latency! [mobilenet_v2_1.0_224_quant_edgetpu.tflite]
Inference time: 3.8395439999999326 ms Reference time: 2.56 ms
******************** Check finished! *******************
But for the model efficientdet_lite2_448_ptq_edgetpu.tflite, running via examples/detect_image.py, the inference goes down from ~330 ms to ~280 ms, while the benchmark says ~100 ms. So there's still a significant gap here, of more than x2. By the way, does the published benchmark assume libedgetpu1-std or max?
In any case, I wish to avoid using the maximum frequency in my application due to the overheating. So I'm still trying to understand which relevant differences between my Rockchip - RK3399 and the coral dev-board, could best explain the difference in performance. Currently I assume it's the PCIe (dev-board) vs USB 3.1 (Rockchip - RK3399). Unless there are any other important elements that I'm missing (?)
Thanks
Hi @noamholz the published benchmark assume libedgetpu1-max. (Since we are doing max performance test we use max frequency).
The benchmarks for efficientdet_lite2_448_ptq_edgetpu.tflite
model are measured with a Coral USB Accelerator on a desktop CPU(Single 64-bit Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz)
. As both(Desktop CPU and Rockchip - RK3399
) are different CPU architectures you might be seeing the significant difference.
Alright, thanks for your answers @hjonnala !
Have a few minutes? We'd love your feedback about the Coral developer experience! Take our 5-minute survey.
Description
Hi there, I am using a Rockchip - RK3399 (64-bit CPUS: Dual Cortex-A72 + Quad Cortex-A53, USB: 3.0), with Ubuntu 18.04.1 LTS, Python 3.6.8. When running examples/detect_image.py with the model efficientdet_lite2_448_ptq_edgetpu.tflite: the inference takes ~330 ms, while I expected ~100 ms (from the published benchmark). Is my expectation realistic? or, are there issues in my setup that I'm not aware of?
Here's what I've done so far:
Finally, I tried the benchmarks/inference_benchmarks.py, and got:
I would appreciate any help, Thanks!
Click to expand!
### Issue Type Performance ### Operating System Ubuntu ### Coral Device USB Accelerator ### Other Devices _No response_ ### Programming Language Python 3.6 ### Relevant Log Output _No response_