Closed tyouritsugun closed 2 years ago
Does this script help you? https://github.com/google-coral/libcoral/blob/master/coral/pipeline/detection_models_test_lib.cc#L81
Hi
Thank you for your reply, it makes sense to me.
I successfully get the detection inference using pipeline method, the model is efficientdet_lite2_448_ptq
. However, I found that the speed of single model is around 90ms and the pipeline one is around 100ms, which means that the pipeline one is slower than the single. According to your experience, what are the possible reasons?
Tyouritsugun
Hmm.. There is no need to pipeline efficientdet_lite2_448_ptq model as the model total size <8MB.
Please check this link for more details: https://coral.ai/docs/edgetpu/pipeline/#overview
Hi,
Thank you for your reply.
The size of efficientdet_lite2_448_ptq
is only 7.21 M as the link below
https://tfhub.dev/tensorflow/lite-model/efficientdet/lite2/detection/default/1
It is true that the size is less than 8M, however, it is a TensorFlow Lite model which cannot be loaded into a Coral TPU directly. Instead, we need to use Edge TPU Compiler
to convert it into the Edge TPU compatible model, its size will be 10.2M as below
https://github.com/google-coral/test_data/blob/104342d2d3480b3e66203073dac24f4e2dbb4c41/efficientdet_lite2_448_ptq_edgetpu.tflite
My confusion is, which model size does the official site refer to? The Tensorflow Lite model or, the compiled Edge TPU compatible model? I think that's the latter one since the official site says that the model will be loaded into the 8M cache.
Regards Tyouritsugun
Hi @tyouritsugun documentation is referring to on-chip memory required for the model. Since most of the model fit on single EdgeTPU and only 705.50KiB off chip memory used, segmenting this model is not a best solution. Model pipe-lining is recommended for large models that otherwise cannot fit into the cache of a single Edge TPU.
Note: Segmenting any model will add some latency, because intermediate tensors must be transferred from one Edge TPU to another. However, the amount of added latency from this I/O transaction depends various factors such as the tensor sizes and how the Edge TPUs are integrated in your system (such as via PCIe or USB bus), and such latency is usually offset by gains in overall throughput and additional Edge TPU caching. So you should carefully measure the performance benefits for your models.
Hi,
Thank you for your reply. Your reply is very useful for me.
Following by your instructions, I tried efficientdet_lite3x_640_ptq
with 2 segments compiled by edgetpu_compiler
as following:
edgetpu_compiler --num_segments=2 -s efficientdet_lite3x_640_ptq.tflite
In my Ubuntu x64, I found that the inference time is roughly 200ms, however, in Raspberry Pi 4B, it's roughly as slow as 1600ms. Why it is so slow in Raspberry Pi, is there any way to optimize it?
And the other question is, I need compare the performance between single and twin Coral TPUs. I can find some of the single benchmarks as below,
https://github.com/NobuoTsukamoto/benchmarks/blob/main/tensorflow/lite/efficentdet/efficientdet.md
Yet none of them are possible for the twin TPUs, because the model smaller than 8M will be nonsense for more than two TPUs. EfficientDet-lite3x
seems to be a good candidate yet it can not be loaded in a single TPU, models smaller than EfficientDet-lite3x
are all less than 8M.
Do you know any detection model which is larger than 8M and possible to be loaded by a single Coral TPU ?
Tyouritsugun
In my Ubuntu x64, I found that the inference time is roughly 200ms, however, in Raspberry Pi 4B, it's roughly as slow as 1600ms. Why it is so slow in Raspberry Pi, is there any way to optimize it?
efficientdet_lite3x_640_ptq_edgetpu.tflite has some operations mapped to CPU. Its CPU power making the huge difference. Please check this comment for more details: https://github.com/google-coral/edgetpu/issues/554#issuecomment-1064205154
Do you know any detection model which is larger than 8M and possible to be loaded by a single Coral TPU ?
You can run any model which is larger than 8M with single Coral TPU. Its just that you can load only parameters upto ~8MB on single TPU. You would need a model which is larger than 8MB and all the operations mapped to EdgeTPU except TFLite_Detection_PostProcess (since it can't be mapped to edgeTPU). I think, we don't have any pre trained models that would satisfy the scenario you are looking for. Please try the objection detection retraining tutorials with different model sizes. Thanks! https://github.com/google-coral/tutorials
Hi,
Thank you for your reply.
I will try your suggestion, meanwhile, I am trying YOLOv5 to see if there are some difference.
Tyouritsugun
Hi,
I am trying to parse the pipeline detection result using
efficientdet_lite2_448_ptq.tflite
.I can not find any hint around the line 211 below, https://github.com/google-coral/libcoral/blob/master/coral/examples/model_pipelining.cc Which looks like just get the result and then dispose them without parsing the results. The definition of
coral::PipelineTensor
does not help because I do not know the underlying data structure ofBuffer* buffer
.I can figure out how to parse the result for a single TPU inference, by reference the code below, after line 104:
https://github.com/google-coral/demo-manufacturing/blob/main/src/inference_wrapper.cc
Can you please give a hint or a reference code to figure out how to parse the pipeline detection results?
Thank you in advance Tyouritsugun