WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.06k stars 4.13k forks source link

Does it run on coral TPU #52

Open fabian57fabian opened 2 years ago

fabian57fabian commented 2 years ago

What model can be runned in Google Coral TPU Accelerator? Is it there a snip code to transform into model_quantized.tflite?

hardikdava commented 1 year ago

Is there any update on this issue? It will be interesting to see the performance on edgetpu.

xrbeattx commented 1 year ago

Bump

xrbeattx commented 1 year ago

From what I understand it seems that their are two issues to this. 1) The EdgeTPU only works with .tflite files/models and 2)The needed libraries to run this require python3.9 and I cannot for the life of me get it to update to it. I found a stack overflow question regarding this but no one has answered, this leads me to believe that you cant for some reason. I honestly dont know why some versions of unbuntu/debian/mendel dont support certain versions of python or vice versa. I really want this to work but TBH I think I am wasting my time

Baael commented 1 year ago

just convert it to tflite link for example how to do that: https://medium.com/geekculture/converting-yolo-v7-to-tensorflow-lite-for-mobile-deployment-ebc1103e8d1e

keesschollaart81 commented 1 year ago

@Baael Are you able to convert it to edgetpu?

hardikdava commented 1 year ago

@keesschollaart81 @Baael I have tried the mentioned workflow. I was able to convert model into tflite but not int8 quantize model which is needed by coral edgetpu. I would still prefer to follow workflow as in YoloV5 i.e. create full network using tensorflow layers which seems to be correct way since many of the operations are not supported by edgetpu.

keesschollaart81 commented 1 year ago

Clear. What was the report/output of the edgetpu_compiler? Like how many of the operations are running on the CPU vs TPU?

hardikdava commented 1 year ago

We need to perform full integer quantization by using tflite converter on tensorflow saved_model and save int8 tflite model. Then we have to compile the model using edgecompiler which generates compiled network for edgetpu. But I got this error when I am performing quantization.

RuntimeError: tensorflow/lite/kernels/conv.cc:357 input_channel % filter_input_channel != 0 (1 != 0)Node number 2 (CONV_2D) failed to prepare.

I think this error is due to channel mismatching. If you know how to solve this error then let me know. @keesschollaart81

drachu commented 1 year ago

Any updates? I'm searching the internet for a solution. I was able to convert model to tflite too but quantization int8 fails everytime.

sph1n3x commented 1 year ago

I have been running some tests for the past few weeks and played around with different input sizes (640, 512, 448, 416) due to limitations of the Coral EdgeTPU. The YOLOv7 standard model is rather large and runs into compilation timeouts at least for object detection @ 640. I have been testing it for instance segmentation mostly though. You might be lucky if you choose smaller input sizes for training (YOLOv7), I have no time to look further into it atm :(

I can confirm that export to edgetpu works with YOLOv7Tiny (640, 512, 448, 416) and YOLOv5s / YOLOv5m (640, 512, 448, 416) without running into any compilation timeouts and subgraph issues.

My only problem here is that reparameterization for segmentation is currently not available, so I have no choice but use the YOLOv5 head (instead of YOLOR) which results in a loss of 1-2% (in terms of precision, recall, mAP).

Maybe @WongKinYiu @AlexeyAB have an idea for the reparameterization of the segmentation model in the u7 branch?

drachu commented 1 year ago

@sph1n3x What was your export process like for YOLOv7Tiny? Did you use https://medium.com/geekculture/converting-yolo-v7-to-tensorflow-lite-for-mobile-deployment-ebc1103e8d1e workflow? If yes which parameters you used in tf.lite.TFLiteConverter for quantization? Good work with your tests! :)

sph1n3x commented 1 year ago

@drachu I actually used a modified export.py script from the u7 branch with changes from the main and u5 branch. It wasn't straightforward as some functions and classes were missing in TensorFlow which had to be implemented. I can, however, provide the changes :+1:

drachu commented 1 year ago

@sph1n3x It would be great!

sph1n3x commented 1 year ago

@drachu Sorry for the late reply! I have been running some extended tests for benchmarking purposes, but the results are not exciting at all. Although the network structure of YOLOv7 Tiny was designed with edge devices in mind, it is not optimized for exportation with the edge compiler. Many operations are still run on the CPU (~ 30%). I have also tried many delegation options without any luck :(

YOLOv5 models do not have these issues. There recent update (v6.2) incorporates optimizations for edge devices which is why almost all operations run on the Edge TPU (<5% on the CPU) at quite feasible speed.

If you still want to use YOLOv7 models, I recommend looking at some edge devices such as the Jetson. You will probably get very good results and speed with TensorRT. You can also consider running the tiny model on a CPU (without the Edge TPU). I can easily achieve ~10 FPS on a Ryzen 3 4300U without any optimizations using the tiny model @ 640 or up to 30 fps @ 416.

drachu commented 1 year ago

Thanks for the answer, tips and your research work @sph1n3x!

triptec commented 1 year ago

@sph1n3x I've been trying to just get a tflite with uint8, currently I'm not so worried about performance, I just want it to run with the same infra as my yolov5 models. Could you share the code you used for the conversion and detection with the result? I have been able to export to onnx and then with https://github.com/MPolaris/onnx2tflite convert it into a tflite with uint8 but I can't make sense of the output from the resulting model. Samething when I used https://github.com/PINTO0309/onnx2tf, can't make sense of the output.

35grain commented 1 year ago

For anyone still looking for a solution - I was able to convert the YOLOv7 tiny model to an Edge TPU compatible tflite format with a resolution of 640 via the openvino2tensorflow converter. Almost all operations were mapped to the Edge TPU (see output below), while with onnx2tf it was the other way round. Unfortunately, that's where the fun ends as I am not faced with an error when running the model on my Raspberry Pi 4 F driver/usb/usb_driver.cc:857] transfer on tag 1 failed. Abort. Deadline exceeded: USB transfer error 2 [LibUsbDataOutCallback]. I tend to believe that this isn't caused by insufficient power delivery from the Pi either as it should be able to output 1200mA total across all ports and I have nothing else plugged in.

Edge TPU Compiler version 16.0.384591198
Searching for valid delegate with step 1
Try to compile segment with 330 ops
Started a compilation timeout timer of 3600 seconds.

Model compiled successfully in 40666 ms.

Input model: saved_model/model_full_integer_quant.tflite
Input size: 6.01MiB
Output model: saved_model/model_full_integer_quant_edgetpu.tflite
Output size: 6.42MiB
On-chip memory used for caching model parameters: 5.90MiB
On-chip memory remaining for caching model parameters: 659.50KiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 2
Total number of operations: 330
Operation log: saved_model/model_full_integer_quant_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 321
Number of operations that will run on CPU: 9

Operator                       Count      Status

ADD                            55         Mapped to Edge TPU
MAX_POOL_2D                    6          Mapped to Edge TPU
CONV_2D                        58         Mapped to Edge TPU
RESIZE_NEAREST_NEIGHBOR        2          Mapped to Edge TPU
CONCATENATION                  14         Mapped to Edge TPU
RELU                           55         Mapped to Edge TPU
MUL                            55         Mapped to Edge TPU
MIRROR_PAD                     3          Operation not supported
RESHAPE                        3          Tensor has unsupported rank (up to 3 innermost dimensions mapped)
TRANSPOSE                      3          Tensor has unsupported rank (up to 3 innermost dimensions mapped)
PAD                            21         Mapped to Edge TPU
MINIMUM                        55         Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded! 

Update: Using a resolution of 512 resulted in a different error: KeyError: 'output_0. But at least it looks like it is trying to run it now? Update update: I was able to reproduce this on a Windows machine too, perhaps the model output is simply too big.

hardikdava commented 1 year ago

Hello @35grain , what was your onnx2tf command? And which model were able to convert?

35grain commented 1 year ago

Hello @35grain , what was your onnx2tf command? And which model were able to convert?

@hardikdava I used the following workflow with openvino2tensorflow: YOLOv7-tiny custom trained model > ONNX > OpenVINO > tflite int8. Here's a code snippet you could modify for your use (I ran it in Colab):

pip install -r requirements.txt # for using the export command from YOLOv7 repo
pip install openvino-dev
pip install openvino2tensorflow
pip install onnx onnxsim # onnxsim for simplifying the model using the export command (optional)

python export.py --weights '/path/to/model.pt' --simplify
mo --input_model '/path/to/model.onnx'
openvino2tensorflow --model_path '/path/to/model.xml' --output_edgetpu

Let me know if you have any luck with getting it running. Currently only YOLOv5n and v5n6 are working for me (though with lower accuracy) while YOLOv7 is in the described state and YOLOv8 has its own issues with exporting. I really just need something usable for my project.

hardikdava commented 1 year ago

Hello, I was able to export yolov7-tiny.pt to edgetpu. But there is a limitations for numbers of class and image size. But yolov7-tiny was able to export as edgetpu and all the ops are compiled successfully. I made my commit to this #1672 pr. It is made under branch u5. Please find below logs for complete export of model.

TensorFlow SavedModel: starting export with tensorflow 2.12.0...

                 from  n    params  module                                  arguments                     
2023-04-24 18:23:29.086379: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
  0                -1  1       928  models.common.Conv                      [3, 32, 3, 2]                 
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1      2112  models.common.Conv                      [64, 32, 1, 1]                
  3                -2  1      2112  models.common.Conv                      [64, 32, 1, 1]                
  4                -1  1      9280  models.common.Conv                      [32, 32, 3, 1]                
  5                -1  1      9280  models.common.Conv                      [32, 32, 3, 1]                
  6  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
  7                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
  8                -1  1         0  models.common.MP                        []                            
  9                -1  1      4224  models.common.Conv                      [64, 64, 1, 1]                
 10                -2  1      4224  models.common.Conv                      [64, 64, 1, 1]                
 11                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 12                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 13  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  models.common.MP                        []                            
 16                -1  1     16640  models.common.Conv                      [128, 128, 1, 1]              
 17                -2  1     16640  models.common.Conv                      [128, 128, 1, 1]              
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 19                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 20  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 21                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 22                -1  1         0  models.common.MP                        []                            
 23                -1  1     66048  models.common.Conv                      [256, 256, 1, 1]              
 24                -2  1     66048  models.common.Conv                      [256, 256, 1, 1]              
 25                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 26                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 27  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 28                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 29                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 30                -2  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 31                -1  1         0  models.common.SP                        [5]                           
 32                -2  1         0  models.common.SP                        [9]                           
 33                -3  1         0  models.common.SP                        [13]                          
 34  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 35                -1  1    262656  models.common.Conv                      [1024, 256, 1, 1]             
 36          [-1, -7]  1         0  models.common.Concat                    [1]                           
 37                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 38                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 39                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 40                21  1     65792  models.common.Conv                      [512, 128, 1, 1]              
 41          [-1, -2]  1         0  models.common.Concat                    [1]                           
 42                -1  1     16512  models.common.Conv                      [256, 64, 1, 1]               
 43                -2  1     16512  models.common.Conv                      [256, 64, 1, 1]               
 44                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 45                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 46  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 47                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 48                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 49                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 50                14  1     16512  models.common.Conv                      [256, 64, 1, 1]               
 51          [-1, -2]  1         0  models.common.Concat                    [1]                           
 52                -1  1      4160  models.common.Conv                      [128, 32, 1, 1]               
 53                -2  1      4160  models.common.Conv                      [128, 32, 1, 1]               
 54                -1  1      9280  models.common.Conv                      [32, 32, 3, 1]                
 55                -1  1      9280  models.common.Conv                      [32, 32, 3, 1]                
 56  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 57                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
 58                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
 59          [-1, 47]  1         0  models.common.Concat                    [1]                           
 60                -1  1     16512  models.common.Conv                      [256, 64, 1, 1]               
 61                -2  1     16512  models.common.Conv                      [256, 64, 1, 1]               
 62                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 63                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 64  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 65                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 66                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
 67          [-1, 37]  1         0  models.common.Concat                    [1]                           
 68                -1  1     65792  models.common.Conv                      [512, 128, 1, 1]              
 69                -2  1     65792  models.common.Conv                      [512, 128, 1, 1]              
 70                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 71                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 72  [-1, -2, -3, -4]  1         0  models.common.Concat                    [1]                           
 73                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 74                57  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 75                65  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 76                73  1   2360320  models.common.Conv                      [512, 512, 3, 1]              
 77      [74, 75, 76]  1     21576  models.yolo.Detect                      [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512], [416, 416]]
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(1, 416, 416, 3)]   0           []                               

 tf_conv (TFConv)               (1, 208, 208, 32)    896         ['input_1[0][0]']                

 tf_conv_1 (TFConv)             (1, 104, 104, 64)    18496       ['tf_conv[0][0]']                

 tf_conv_3 (TFConv)             (1, 104, 104, 32)    2080        ['tf_conv_1[0][0]']              

 tf_conv_4 (TFConv)             (1, 104, 104, 32)    9248        ['tf_conv_3[0][0]']              

 tf_conv_5 (TFConv)             (1, 104, 104, 32)    9248        ['tf_conv_4[0][0]']              

 tf_conv_2 (TFConv)             (1, 104, 104, 32)    2080        ['tf_conv_1[0][0]']              

 tf_concat (TFConcat)           (1, 104, 104, 128)   0           ['tf_conv_5[0][0]',              
                                                                  'tf_conv_4[0][0]',              
                                                                  'tf_conv_3[0][0]',              
                                                                  'tf_conv_2[0][0]']              

 tf_conv_6 (TFConv)             (1, 104, 104, 64)    8256        ['tf_concat[0][0]']              

 tfmp (TFMP)                    (1, 52, 52, 64)      0           ['tf_conv_6[0][0]']              

 tf_conv_8 (TFConv)             (1, 52, 52, 64)      4160        ['tfmp[0][0]']                   

 tf_conv_9 (TFConv)             (1, 52, 52, 64)      36928       ['tf_conv_8[0][0]']              

 tf_conv_10 (TFConv)            (1, 52, 52, 64)      36928       ['tf_conv_9[0][0]']              

 tf_conv_7 (TFConv)             (1, 52, 52, 64)      4160        ['tfmp[0][0]']                   

 tf_concat_1 (TFConcat)         (1, 52, 52, 256)     0           ['tf_conv_10[0][0]',             
                                                                  'tf_conv_9[0][0]',              
                                                                  'tf_conv_8[0][0]',              
                                                                  'tf_conv_7[0][0]']              

 tf_conv_11 (TFConv)            (1, 52, 52, 128)     32896       ['tf_concat_1[0][0]']            

 tfmp_1 (TFMP)                  (1, 26, 26, 128)     0           ['tf_conv_11[0][0]']             

 tf_conv_13 (TFConv)            (1, 26, 26, 128)     16512       ['tfmp_1[0][0]']                 

 tf_conv_14 (TFConv)            (1, 26, 26, 128)     147584      ['tf_conv_13[0][0]']             

 tf_conv_15 (TFConv)            (1, 26, 26, 128)     147584      ['tf_conv_14[0][0]']             

 tf_conv_12 (TFConv)            (1, 26, 26, 128)     16512       ['tfmp_1[0][0]']                 

 tf_concat_2 (TFConcat)         (1, 26, 26, 512)     0           ['tf_conv_15[0][0]',             
                                                                  'tf_conv_14[0][0]',             
                                                                  'tf_conv_13[0][0]',             
                                                                  'tf_conv_12[0][0]']             

 tf_conv_16 (TFConv)            (1, 26, 26, 256)     131328      ['tf_concat_2[0][0]']            

 tfmp_2 (TFMP)                  (1, 13, 13, 256)     0           ['tf_conv_16[0][0]']             

 tf_conv_18 (TFConv)            (1, 13, 13, 256)     65792       ['tfmp_2[0][0]']                 

 tf_conv_19 (TFConv)            (1, 13, 13, 256)     590080      ['tf_conv_18[0][0]']             

 tf_conv_20 (TFConv)            (1, 13, 13, 256)     590080      ['tf_conv_19[0][0]']             

 tf_conv_17 (TFConv)            (1, 13, 13, 256)     65792       ['tfmp_2[0][0]']                 

 tf_concat_3 (TFConcat)         (1, 13, 13, 1024)    0           ['tf_conv_20[0][0]',             
                                                                  'tf_conv_19[0][0]',             
                                                                  'tf_conv_18[0][0]',             
                                                                  'tf_conv_17[0][0]']             

 tf_conv_21 (TFConv)            (1, 13, 13, 512)     524800      ['tf_concat_3[0][0]']            

 tf_conv_23 (TFConv)            (1, 13, 13, 256)     131328      ['tf_conv_21[0][0]']             

 tfsp_2 (TFSP)                  (1, 13, 13, 256)     0           ['tf_conv_23[0][0]']             

 tfsp_1 (TFSP)                  (1, 13, 13, 256)     0           ['tf_conv_23[0][0]']             

 tfsp (TFSP)                    (1, 13, 13, 256)     0           ['tf_conv_23[0][0]']             

 tf_concat_4 (TFConcat)         (1, 13, 13, 1024)    0           ['tfsp_2[0][0]',                 
                                                                  'tfsp_1[0][0]',                 
                                                                  'tfsp[0][0]',                   
                                                                  'tf_conv_23[0][0]']             

 tf_conv_24 (TFConv)            (1, 13, 13, 256)     262400      ['tf_concat_4[0][0]']            

 tf_conv_22 (TFConv)            (1, 13, 13, 256)     131328      ['tf_conv_21[0][0]']             

 tf_concat_5 (TFConcat)         (1, 13, 13, 512)     0           ['tf_conv_24[0][0]',             
                                                                  'tf_conv_22[0][0]']             

 tf_conv_25 (TFConv)            (1, 13, 13, 256)     131328      ['tf_concat_5[0][0]']            

 tf_conv_26 (TFConv)            (1, 13, 13, 128)     32896       ['tf_conv_25[0][0]']             

 tf_conv_27 (TFConv)            (1, 26, 26, 128)     32896       ['tf_conv_16[0][0]']             

 tf_upsample (TFUpsample)       (1, 26, 26, 128)     0           ['tf_conv_26[0][0]']             

 tf_concat_6 (TFConcat)         (1, 26, 26, 256)     0           ['tf_conv_27[0][0]',             
                                                                  'tf_upsample[0][0]']            

 tf_conv_29 (TFConv)            (1, 26, 26, 64)      16448       ['tf_concat_6[0][0]']            

 tf_conv_30 (TFConv)            (1, 26, 26, 64)      36928       ['tf_conv_29[0][0]']             

 tf_conv_31 (TFConv)            (1, 26, 26, 64)      36928       ['tf_conv_30[0][0]']             

 tf_conv_28 (TFConv)            (1, 26, 26, 64)      16448       ['tf_concat_6[0][0]']            

 tf_concat_7 (TFConcat)         (1, 26, 26, 256)     0           ['tf_conv_31[0][0]',             
                                                                  'tf_conv_30[0][0]',             
                                                                  'tf_conv_29[0][0]',             
                                                                  'tf_conv_28[0][0]']             

 tf_conv_32 (TFConv)            (1, 26, 26, 128)     32896       ['tf_concat_7[0][0]']            

 tf_conv_33 (TFConv)            (1, 26, 26, 64)      8256        ['tf_conv_32[0][0]']             

 tf_conv_34 (TFConv)            (1, 52, 52, 64)      8256        ['tf_conv_11[0][0]']             

 tf_upsample_1 (TFUpsample)     (1, 52, 52, 64)      0           ['tf_conv_33[0][0]']             

 tf_concat_8 (TFConcat)         (1, 52, 52, 128)     0           ['tf_conv_34[0][0]',             
                                                                  'tf_upsample_1[0][0]']          

 tf_conv_36 (TFConv)            (1, 52, 52, 32)      4128        ['tf_concat_8[0][0]']            

 tf_conv_37 (TFConv)            (1, 52, 52, 32)      9248        ['tf_conv_36[0][0]']             

 tf_conv_38 (TFConv)            (1, 52, 52, 32)      9248        ['tf_conv_37[0][0]']             

 tf_conv_35 (TFConv)            (1, 52, 52, 32)      4128        ['tf_concat_8[0][0]']            

 tf_concat_9 (TFConcat)         (1, 52, 52, 128)     0           ['tf_conv_38[0][0]',             
                                                                  'tf_conv_37[0][0]',             
                                                                  'tf_conv_36[0][0]',             
                                                                  'tf_conv_35[0][0]']             

 tf_conv_39 (TFConv)            (1, 52, 52, 64)      8256        ['tf_concat_9[0][0]']            

 tf_conv_40 (TFConv)            (1, 26, 26, 128)     73856       ['tf_conv_39[0][0]']             

 tf_concat_10 (TFConcat)        (1, 26, 26, 256)     0           ['tf_conv_40[0][0]',             
                                                                  'tf_conv_32[0][0]']             

 tf_conv_42 (TFConv)            (1, 26, 26, 64)      16448       ['tf_concat_10[0][0]']           

 tf_conv_43 (TFConv)            (1, 26, 26, 64)      36928       ['tf_conv_42[0][0]']             

 tf_conv_44 (TFConv)            (1, 26, 26, 64)      36928       ['tf_conv_43[0][0]']             

 tf_conv_41 (TFConv)            (1, 26, 26, 64)      16448       ['tf_concat_10[0][0]']           

 tf_concat_11 (TFConcat)        (1, 26, 26, 256)     0           ['tf_conv_44[0][0]',             
                                                                  'tf_conv_43[0][0]',             
                                                                  'tf_conv_42[0][0]',             
                                                                  'tf_conv_41[0][0]']             

 tf_conv_45 (TFConv)            (1, 26, 26, 128)     32896       ['tf_concat_11[0][0]']           

 tf_conv_46 (TFConv)            (1, 13, 13, 256)     295168      ['tf_conv_45[0][0]']             

 tf_concat_12 (TFConcat)        (1, 13, 13, 512)     0           ['tf_conv_46[0][0]',             
                                                                  'tf_conv_25[0][0]']             

 tf_conv_48 (TFConv)            (1, 13, 13, 128)     65664       ['tf_concat_12[0][0]']           

 tf_conv_49 (TFConv)            (1, 13, 13, 128)     147584      ['tf_conv_48[0][0]']             

 tf_conv_50 (TFConv)            (1, 13, 13, 128)     147584      ['tf_conv_49[0][0]']             

 tf_conv_47 (TFConv)            (1, 13, 13, 128)     65664       ['tf_concat_12[0][0]']           

 tf_concat_13 (TFConcat)        (1, 13, 13, 512)     0           ['tf_conv_50[0][0]',             
                                                                  'tf_conv_49[0][0]',             
                                                                  'tf_conv_48[0][0]',             
                                                                  'tf_conv_47[0][0]']             

 tf_conv_51 (TFConv)            (1, 13, 13, 256)     131328      ['tf_concat_13[0][0]']           

 tf_conv_52 (TFConv)            (1, 52, 52, 128)     73856       ['tf_conv_39[0][0]']             

 tf_conv_53 (TFConv)            (1, 26, 26, 256)     295168      ['tf_conv_45[0][0]']             

 tf_conv_54 (TFConv)            (1, 13, 13, 512)     1180160     ['tf_conv_51[0][0]']             

 tf_detect (TFDetect)           ((1, 10647, 8),      21576       ['tf_conv_52[0][0]',             
                                )                                 'tf_conv_53[0][0]',             
                                                                  'tf_conv_54[0][0]']             

==================================================================================================
Total params: 6,012,040
Trainable params: 0
Non-trainable params: 6,012,040
__________________________________________________________________________________________________
TensorFlow SavedModel: export success ✅ 8.6s, saved as runs/train/exp8/weights/best_saved_model (23.1 MB)

TensorFlow Lite: starting export with tensorflow 2.12.0...
WARNING:absl:Found untraced functions such as conv2d_3_layer_call_fn, conv2d_3_layer_call_and_return_conditional_losses, _jit_compiled_convolution_op, conv2d_4_layer_call_fn, conv2d_4_layer_call_and_return_conditional_losses while saving (showing 5 of 172). These functions will not be directly callable after loading.
fully_quantize: 0, inference_type: 6, input_inference_type: UINT8, output_inference_type: UINT8
TensorFlow Lite: export success ✅ 105.0s, saved as runs/train/exp8/weights/best-int8.tflite (6.0 MB)

Edge TPU: starting export with Edge TPU compiler 16.0.384591198...
Edge TPU Compiler version 16.0.384591198
Searching for valid delegate with step 10
Try to compile segment with 246 ops
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 3133 ms.

Input model: runs/train/exp8/weights/best-int8.tflite
Input size: 6.00MiB
Output model: runs/train/exp8/weights/best-int8_edgetpu.tflite
Output size: 6.47MiB
On-chip memory used for caching model parameters: 5.90MiB
On-chip memory remaining for caching model parameters: 1.16MiB
Off-chip memory used for streaming uncached model parameters: 41.75KiB
Number of Edge TPU subgraphs: 1
Total number of operations: 246
Operation log: runs/train/exp8/weights/best-int8_edgetpu.log

Operator                       Count      Status

ADD                            3          Mapped to Edge TPU
RESHAPE                        6          Mapped to Edge TPU
MAX_POOL_2D                    4          Mapped to Edge TPU
CONCATENATION                  18         Mapped to Edge TPU
PAD                            4          Mapped to Edge TPU
LOGISTIC                       64         Mapped to Edge TPU
RESIZE_NEAREST_NEIGHBOR        2          Mapped to Edge TPU
QUANTIZE                       5          Mapped to Edge TPU
MUL                            73         Mapped to Edge TPU
STRIDED_SLICE                  9          Mapped to Edge TPU
CONV_2D                        58         Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded! 
Edge TPU: export success ✅ 3.4s, saved as runs/train/exp8/weights/best-int8_edgetpu.tflite (6.5 MB)
35grain commented 1 year ago

Hello, I was able to export yolov7-tiny.pt to edgetpu. But there is a limitations for numbers of class and image size. But yolov7-tiny was able to export as edgetpu and all the ops are compiled successfully. I made my commit to this #1672 pr. It is made under branch u5. Please find below logs for complete export of model.

TensorFlow SavedModel: starting export with tensorflow 2.12.0...

                 from  n    params  module                                  arguments                     
2023-04-24 18:23:29.086379: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: 
...

Operator                       Count      Status

ADD                            3          Mapped to Edge TPU
RESHAPE                        6          Mapped to Edge TPU
MAX_POOL_2D                    4          Mapped to Edge TPU
CONCATENATION                  18         Mapped to Edge TPU
PAD                            4          Mapped to Edge TPU
LOGISTIC                       64         Mapped to Edge TPU
RESIZE_NEAREST_NEIGHBOR        2          Mapped to Edge TPU
QUANTIZE                       5          Mapped to Edge TPU
MUL                            73         Mapped to Edge TPU
STRIDED_SLICE                  9          Mapped to Edge TPU
CONV_2D                        58         Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded! 
Edge TPU: export success ✅ 3.4s, saved as runs/train/exp8/weights/best-int8_edgetpu.tflite (6.5 MB)

Yeah, it exported for me too but did you try running it as well?

hardikdava commented 1 year ago

hello @35grain, int8 tflite is detecting fine. It has also the same issue of accuracy drop after quantization.

35grain commented 1 year ago

The pull request appears to be for a really old branch..