google-coral / pycoral

Python API for ML inferencing and transfer-learning on Coral devices
https://coral.ai
Apache License 2.0
340 stars 138 forks source link

mode pipelining for detection #101

Closed pazikk closed 1 year ago

pazikk commented 1 year ago

Description

Hello, I am trying to inference detection model using 2 tpu's using model pipelining. I have google coral dev board and google coral usb accelerator connected to coral devboard. I am trying to run examples/model_pipelining_classify_image.py but for detection model ssd_mobilenet_v1_coco_quant_postprocess. I am trying to write my own script model_pipelining_detect_image.py based on model_pipelining_classify_image.py but I cannot make it work.

First of all, I don't know how to get boxes, classes and scores from result (result = runner.pop()).

Secondly, when I try not to parse result and run the script, I get following error: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "examples/model_pipelining_classify_image.py", line 148, in consumer result = runner.pop() File "/usr/lib/python3/dist-packages/pycoral/pipeline/pipelined_model_runner.py", line 170, in pop result = {k: v.reshape(self._output_shapes[k]) for k, v in result.items()} File "/usr/lib/python3/dist-packages/pycoral/pipeline/pipelined_model_runner.py", line 170, in result = {k: v.reshape(self._output_shapes[k]) for k, v in result.items()} ValueError: cannot reshape array of size 320 into shape (1,20,4)

Thirdly, after skip reshaping runner output in pop mether od PipelinedRunnerMethod I get following inference speed: Average inference time (over 5 iterations): 18.1ms While inferencing with one tpu using python3 examples/detect_image.py --model test_data/ssd_mobilenet_v1_coco_quant_postprocess --input test_data/grace_hopper.bmp is giving: 38.19 ms 15.91 ms 14.68 ms 13.67 ms 13.86 ms Which is faster then the script using 2 tpu's.

Thanks for help in advance.

Click to expand! ### Issue Type Feature Request ### Operating System Mendel Linux ### Coral Device Dev Board, USB Accelerator ### Other Devices _No response_ ### Programming Language Python 3.7 ### Relevant Log Output _No response_
hjonnala commented 1 year ago

Hi @pazikk please check this comment for sample code: https://github.com/google-coral/tutorials/issues/17#issuecomment-972277946 You may have to modify the get_pipeline_objects funtcion as per you model. Thanks

pazikk commented 1 year ago

@hjonnala thank you so much! How about inference time using 2 tpus vs using only 1? I still register longer inference time when using 2 tpu's then using single one.

hjonnala commented 1 year ago

It depends on the model. If the model size is less than <8Mb there is no need to pipeline the model. Please check the documentation for more details at: https://coral.ai/docs/edgetpu/pipeline/#overview. Thanks!

pazikk commented 1 year ago

@hjonnala I did following test, I took efficientdet_lite3_512_ptq.tflite (12mb) from test_data and generated edgetpu models with 1 and 2 segments with following commands: edgetpu_compiler -s -d -k 10 -n 2 efficientdet_lite3_512_ptq.tflite edgetpu_compiler -s -d -k 10 -n 1 efficientdet_lite3_512_ptq.tflite Here are the compilation logs: edge_tpu_compilation_log_n1.txt edge_tpu_compilation_log_n2.txt

efficientdet_lite3_512_ptq_edgetpu.tflite weights 15mb, efficientdet_lite3_512_ptq_segment_0_of_2_edgetpu.tflite weights 6.6mb and efficientdet_lite3_512_ptq_segment_1_of_2_edgetpu.tflite weights 6.8. Now, as far as I understand, its good use case to increase inference speed using model pipelining.

However, when running: python3 examples/model_pipelining_detect_image.py --models /home/mendel/efficientdet_lite3_512_ptq/efficientdet_lite3_512_ptqsegment%d_of_2_edgetpu.tflite --input test_data/grace_hopper.bmp I got: Average inference time (over 5 iterations): 825.1ms

And by running: python3 examples/detect_image.py --model /home/mendel/efficientdet_lite3_512_ptq/efficientdet_lite3_512_ptq_edgetpu.tflite --input test_data/grace_hopper.bmp I got: 609.91 ms 532.15 ms 621.96 ms 599.23 ms 533.96 ms

As you can see, inference with model pipeling using 2 tpu's still takes longer then inferencing with just one tpu.

At this point, I am just looking for scenario when model pipeling for detection will increase inference speed. Ultimately, I want to work with yolov5 medium network.

hjonnala commented 1 year ago

Here are the compilation logs: edge_tpu_compilation_log_n1.txt edge_tpu_compilation_log_n2.txt

Here more number of operations mapped to CPU when segmenting the model (151 vs16). And also for this model 5 intermediate tensors are needed to transfer from one TPU to another TPU. image

image

google-coral-bot[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No