Closed pazikk closed 1 year ago
Hi @pazikk please check this comment for sample code: https://github.com/google-coral/tutorials/issues/17#issuecomment-972277946 You may have to modify the get_pipeline_objects funtcion as per you model. Thanks
@hjonnala thank you so much! How about inference time using 2 tpus vs using only 1? I still register longer inference time when using 2 tpu's then using single one.
It depends on the model. If the model size is less than <8Mb there is no need to pipeline the model. Please check the documentation for more details at: https://coral.ai/docs/edgetpu/pipeline/#overview. Thanks!
@hjonnala I did following test, I took efficientdet_lite3_512_ptq.tflite (12mb) from test_data and generated edgetpu models with 1 and 2 segments with following commands: edgetpu_compiler -s -d -k 10 -n 2 efficientdet_lite3_512_ptq.tflite edgetpu_compiler -s -d -k 10 -n 1 efficientdet_lite3_512_ptq.tflite Here are the compilation logs: edge_tpu_compilation_log_n1.txt edge_tpu_compilation_log_n2.txt
efficientdet_lite3_512_ptq_edgetpu.tflite weights 15mb, efficientdet_lite3_512_ptq_segment_0_of_2_edgetpu.tflite weights 6.6mb and efficientdet_lite3_512_ptq_segment_1_of_2_edgetpu.tflite weights 6.8. Now, as far as I understand, its good use case to increase inference speed using model pipelining.
However, when running: python3 examples/model_pipelining_detect_image.py --models /home/mendel/efficientdet_lite3_512_ptq/efficientdet_lite3_512_ptqsegment%d_of_2_edgetpu.tflite --input test_data/grace_hopper.bmp I got: Average inference time (over 5 iterations): 825.1ms
And by running: python3 examples/detect_image.py --model /home/mendel/efficientdet_lite3_512_ptq/efficientdet_lite3_512_ptq_edgetpu.tflite --input test_data/grace_hopper.bmp I got: 609.91 ms 532.15 ms 621.96 ms 599.23 ms 533.96 ms
As you can see, inference with model pipeling using 2 tpu's still takes longer then inferencing with just one tpu.
At this point, I am just looking for scenario when model pipeling for detection will increase inference speed. Ultimately, I want to work with yolov5 medium network.
Here are the compilation logs: edge_tpu_compilation_log_n1.txt edge_tpu_compilation_log_n2.txt
Here more number of operations mapped to CPU when segmenting the model (151 vs16). And also for this model 5 intermediate tensors are needed to transfer from one TPU to another TPU.
Description
Hello, I am trying to inference detection model using 2 tpu's using model pipelining. I have google coral dev board and google coral usb accelerator connected to coral devboard. I am trying to run examples/model_pipelining_classify_image.py but for detection model ssd_mobilenet_v1_coco_quant_postprocess. I am trying to write my own script model_pipelining_detect_image.py based on model_pipelining_classify_image.py but I cannot make it work.
First of all, I don't know how to get boxes, classes and scores from result (result = runner.pop()).
Secondly, when I try not to parse result and run the script, I get following error: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "examples/model_pipelining_classify_image.py", line 148, in consumer result = runner.pop() File "/usr/lib/python3/dist-packages/pycoral/pipeline/pipelined_model_runner.py", line 170, in pop result = {k: v.reshape(self._output_shapes[k]) for k, v in result.items()} File "/usr/lib/python3/dist-packages/pycoral/pipeline/pipelined_model_runner.py", line 170, in
result = {k: v.reshape(self._output_shapes[k]) for k, v in result.items()}
ValueError: cannot reshape array of size 320 into shape (1,20,4)
Thirdly, after skip reshaping runner output in pop mether od PipelinedRunnerMethod I get following inference speed: Average inference time (over 5 iterations): 18.1ms While inferencing with one tpu using python3 examples/detect_image.py --model test_data/ssd_mobilenet_v1_coco_quant_postprocess --input test_data/grace_hopper.bmp is giving: 38.19 ms 15.91 ms 14.68 ms 13.67 ms 13.86 ms Which is faster then the script using 2 tpu's.
Thanks for help in advance.
Click to expand!
### Issue Type Feature Request ### Operating System Mendel Linux ### Coral Device Dev Board, USB Accelerator ### Other Devices _No response_ ### Programming Language Python 3.7 ### Relevant Log Output _No response_