When pipeline the model, how to parse the detection results?

tyouritsugun commented 2 years ago

Hi,

I am trying to parse the pipeline detection result using efficientdet_lite2_448_ptq.tflite.

I can not find any hint around the line 211 below, https://github.com/google-coral/libcoral/blob/master/coral/examples/model_pipelining.cc Which looks like just get the result and then dispose them without parsing the results. The definition of coral::PipelineTensor does not help because I do not know the underlying data structure of Buffer* buffer.

I can figure out how to parse the result for a single TPU inference, by reference the code below, after line 104:
https://github.com/google-coral/demo-manufacturing/blob/main/src/inference_wrapper.cc

Can you please give a hint or a reference code to figure out how to parse the pipeline detection results?

Thank you in advance Tyouritsugun

hjonnala commented 2 years ago

Does this script help you? https://github.com/google-coral/libcoral/blob/master/coral/pipeline/detection_models_test_lib.cc#L81

tyouritsugun commented 2 years ago

Hi

Thank you for your reply, it makes sense to me.

I successfully get the detection inference using pipeline method, the model is efficientdet_lite2_448_ptq. However, I found that the speed of single model is around 90ms and the pipeline one is around 100ms, which means that the pipeline one is slower than the single. According to your experience, what are the possible reasons?

Tyouritsugun

hjonnala commented 2 years ago

Hmm.. There is no need to pipeline efficientdet_lite2_448_ptq model as the model total size <8MB.

Please check this link for more details: https://coral.ai/docs/edgetpu/pipeline/#overview

tyouritsugun commented 2 years ago

Hi,

Thank you for your reply.

The size of efficientdet_lite2_448_ptq is only 7.21 M as the link below https://tfhub.dev/tensorflow/lite-model/efficientdet/lite2/detection/default/1

It is true that the size is less than 8M, however, it is a TensorFlow Lite model which cannot be loaded into a Coral TPU directly. Instead, we need to use Edge TPU Compiler to convert it into the Edge TPU compatible model, its size will be 10.2M as below https://github.com/google-coral/test_data/blob/104342d2d3480b3e66203073dac24f4e2dbb4c41/efficientdet_lite2_448_ptq_edgetpu.tflite

My confusion is, which model size does the official site refer to? The Tensorflow Lite model or, the compiled Edge TPU compatible model? I think that's the latter one since the official site says that the model will be loaded into the 8M cache.

Regards Tyouritsugun

hjonnala commented 2 years ago

Hi @tyouritsugun documentation is referring to on-chip memory required for the model. Since most of the model fit on single EdgeTPU and only 705.50KiB off chip memory used, segmenting this model is not a best solution. Model pipe-lining is recommended for large models that otherwise cannot fit into the cache of a single Edge TPU.

Note: Segmenting any model will add some latency, because intermediate tensors must be transferred from one Edge TPU to another. However, the amount of added latency from this I/O transaction depends various factors such as the tensor sizes and how the Edge TPUs are integrated in your system (such as via PCIe or USB bus), and such latency is usually offset by gains in overall throughput and additional Edge TPU caching. So you should carefully measure the performance benefits for your models.

tyouritsugun commented 2 years ago

Hi,

Thank you for your reply. Your reply is very useful for me.

Following by your instructions, I tried efficientdet_lite3x_640_ptq with 2 segments compiled by edgetpu_compiler as following:

edgetpu_compiler --num_segments=2 -s efficientdet_lite3x_640_ptq.tflite

In my Ubuntu x64, I found that the inference time is roughly 200ms, however, in Raspberry Pi 4B, it's roughly as slow as 1600ms. Why it is so slow in Raspberry Pi, is there any way to optimize it?

And the other question is, I need compare the performance between single and twin Coral TPUs. I can find some of the single benchmarks as below,
https://github.com/NobuoTsukamoto/benchmarks/blob/main/tensorflow/lite/efficentdet/efficientdet.md Yet none of them are possible for the twin TPUs, because the model smaller than 8M will be nonsense for more than two TPUs. EfficientDet-lite3x seems to be a good candidate yet it can not be loaded in a single TPU, models smaller than EfficientDet-lite3x are all less than 8M. Do you know any detection model which is larger than 8M and possible to be loaded by a single Coral TPU ?

Tyouritsugun

hjonnala commented 2 years ago

In my Ubuntu x64, I found that the inference time is roughly 200ms, however, in Raspberry Pi 4B, it's roughly as slow as 1600ms. Why it is so slow in Raspberry Pi, is there any way to optimize it?

efficientdet_lite3x_640_ptq_edgetpu.tflite has some operations mapped to CPU. Its CPU power making the huge difference. Please check this comment for more details: https://github.com/google-coral/edgetpu/issues/554#issuecomment-1064205154

Do you know any detection model which is larger than 8M and possible to be loaded by a single Coral TPU ?

You can run any model which is larger than 8M with single Coral TPU. Its just that you can load only parameters upto ~8MB on single TPU. You would need a model which is larger than 8MB and all the operations mapped to EdgeTPU except TFLite_Detection_PostProcess (since it can't be mapped to edgeTPU). I think, we don't have any pre trained models that would satisfy the scenario you are looking for. Please try the objection detection retraining tutorials with different model sizes. Thanks! https://github.com/google-coral/tutorials

tyouritsugun commented 2 years ago

Hi,

Thank you for your reply.

I will try your suggestion, meanwhile, I am trying YOLOv5 to see if there are some difference.

Tyouritsugun

google-coral-bot[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No

google-coral / libcoral

When pipeline the model, how to parse the detection results? #20