Inference benchmarks for Movenet

google-coral / pycoral

Python API for ML inferencing and transfer-learning on Coral devices

https://coral.ai

Apache License 2.0

347 stars 144 forks source link

Inference benchmarks for Movenet #55

Closed lfriedri closed 2 years ago

lfriedri commented 2 years ago

Description

Could you please provide benchmark results for Movenet (lightning + thunder) inference latency on Coral Dev Board?

I currently run Movenet on USB Accelerator connected to RaspberryPi4. Unfortunately, inference latency here is more than a factor of two larger than the value of 13.8ms promoted here https://coral.ai/models/pose-estimation/ for thunder.

Getting a measured value for the inference latency of Movenet on Coral Dev Board would help me in taking a buying decision for the board. I am looking for an embedded solution that has inference latency for movenet thunder < 20ms.

Click to expand!

### Issue Type Documentation Feature Request ### Operating System Linux ### Coral Device Dev Board, USB Accelerator ### Other Devices _No response_ ### Programming Language _No response_ ### Relevant Log Output _No response_

hjonnala commented 2 years ago

Hi, Here are the results for 200 iterations for interpreter.invoke() on

Coral Dev Board + Linux Machine: 46.54 ms USB accelerator + Linux Machine: 16.42 ms

lfriedri commented 2 years ago

Hi, thank you for the data.

I assume the "Linux Machine" is a "Desktop CPU: Single 64-bit Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz" as you stated elsewhere?

I do not understand the description "Coral Dev Board + Linux Machine": So far, I thought that the Coral Dev Board does not need a host machine for doing the inference.?

Are these values for thunder or for lightning?

hjonnala commented 2 years ago

I assume the "Linux Machine" is a "Desktop CPU: Single 64-bit Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz" as you stated elsewhere?

Linux machine is my laptop with Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz

I do not understand the description "Coral Dev Board + Linux Machine": So far, I thought that the Coral Dev Board does not need a host machine for doing the inference.?

Dev board requires a host computer running Linux (recommended), Mac, or Windows 10.

Are these values for thunder or for lightning?

these values for thunder

lfriedri commented 2 years ago

Thank you for the details.

From this data, I will now concentrate on the USB-Accelerator (The dev board does not help compared to my RaspberryPi setup).

Do you have any explanation for the large differences in inference time? I mean, for which feature of the host system should I look out to achieve thunder inference <20ms with the USB-Accelerator? From my understanding it can not be USB speed, since in the Coral Dev Board the Edge TPU is connected via PCIe to the CPU and this should be faster than any USB connection, still the Coral Dev Board has lower performance than the USB-connected setup with the "Linux machine".

I did an experiment on an x86 Windows machine (see https://stackoverflow.com/questions/69582152/can-someone-provide-latency-results-closer-to-the-official-numbers) with the result of 42ms, which is also not satisfying for me. (However, in this experiment, I used an older version of the edgetpu_runtime.zip: 20210119. Could this be the reason?)

Naveen-Dodda commented 2 years ago

Hi Ifiedri,

Dev board has high latency when compared with X86 machine for Movenet. Note : Total latency = TPU latency + CPU latency Movenet has 50% of its operations running on CPU. Since Dev board has limited cpu compute it takes longer to complete the inference resulting increase in over all latency.

The difference in latency is only due to CPU compute power of host machine for this Model.

lfriedri commented 2 years ago

Hi, thank you, that clarifies the issue. Do you have any recommendation for an embedded device (ARM?) that provides a strong enough CPU to get in total < 20ms for thunder?

Regards Lars

Naveen-Dodda commented 2 years ago

Hello Lars,

We haven't evaluated the model on other ARM platforms. I would recommend to look for multi thread CPU Arm board that can match x86 performance benchmarks.

Good luck with search,

Thanks, Naveen

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No