Open fabian57fabian opened 2 years ago
Is there any update on this issue? It will be interesting to see the performance on edgetpu.
Bump
From what I understand it seems that their are two issues to this. 1) The EdgeTPU only works with .tflite files/models and 2)The needed libraries to run this require python3.9 and I cannot for the life of me get it to update to it. I found a stack overflow question regarding this but no one has answered, this leads me to believe that you cant for some reason. I honestly dont know why some versions of unbuntu/debian/mendel dont support certain versions of python or vice versa. I really want this to work but TBH I think I am wasting my time
just convert it to tflite link for example how to do that: https://medium.com/geekculture/converting-yolo-v7-to-tensorflow-lite-for-mobile-deployment-ebc1103e8d1e
@Baael Are you able to convert it to edgetpu?
@keesschollaart81 @Baael I have tried the mentioned workflow. I was able to convert model into tflite but not int8 quantize model which is needed by coral edgetpu. I would still prefer to follow workflow as in YoloV5 i.e. create full network using tensorflow layers which seems to be correct way since many of the operations are not supported by edgetpu.
Clear. What was the report/output of the edgetpu_compiler? Like how many of the operations are running on the CPU vs TPU?
We need to perform full integer quantization by using tflite converter on tensorflow saved_model and save int8 tflite model. Then we have to compile the model using edgecompiler which generates compiled network for edgetpu. But I got this error when I am performing quantization.
RuntimeError: tensorflow/lite/kernels/conv.cc:357 input_channel % filter_input_channel != 0 (1 != 0)Node number 2 (CONV_2D) failed to prepare.
I think this error is due to channel mismatching. If you know how to solve this error then let me know. @keesschollaart81
Any updates? I'm searching the internet for a solution. I was able to convert model to tflite too but quantization int8 fails everytime.
I have been running some tests for the past few weeks and played around with different input sizes (640, 512, 448, 416) due to limitations of the Coral EdgeTPU. The YOLOv7 standard model is rather large and runs into compilation timeouts at least for object detection @ 640. I have been testing it for instance segmentation mostly though. You might be lucky if you choose smaller input sizes for training (YOLOv7), I have no time to look further into it atm :(
I can confirm that export to edgetpu works with YOLOv7Tiny (640, 512, 448, 416) and YOLOv5s / YOLOv5m (640, 512, 448, 416) without running into any compilation timeouts and subgraph issues.
My only problem here is that reparameterization for segmentation is currently not available, so I have no choice but use the YOLOv5 head (instead of YOLOR) which results in a loss of 1-2% (in terms of precision, recall, mAP).
Maybe @WongKinYiu @AlexeyAB have an idea for the reparameterization of the segmentation model in the u7 branch?
@sph1n3x What was your export process like for YOLOv7Tiny? Did you use https://medium.com/geekculture/converting-yolo-v7-to-tensorflow-lite-for-mobile-deployment-ebc1103e8d1e workflow? If yes which parameters you used in tf.lite.TFLiteConverter for quantization? Good work with your tests! :)
@drachu I actually used a modified export.py script from the u7 branch with changes from the main and u5 branch. It wasn't straightforward as some functions and classes were missing in TensorFlow which had to be implemented. I can, however, provide the changes :+1:
@sph1n3x It would be great!
@drachu Sorry for the late reply! I have been running some extended tests for benchmarking purposes, but the results are not exciting at all. Although the network structure of YOLOv7 Tiny was designed with edge devices in mind, it is not optimized for exportation with the edge compiler. Many operations are still run on the CPU (~ 30%). I have also tried many delegation options without any luck :(
YOLOv5 models do not have these issues. There recent update (v6.2) incorporates optimizations for edge devices which is why almost all operations run on the Edge TPU (<5% on the CPU) at quite feasible speed.
If you still want to use YOLOv7 models, I recommend looking at some edge devices such as the Jetson. You will probably get very good results and speed with TensorRT. You can also consider running the tiny model on a CPU (without the Edge TPU). I can easily achieve ~10 FPS on a Ryzen 3 4300U without any optimizations using the tiny model @ 640 or up to 30 fps @ 416.
Thanks for the answer, tips and your research work @sph1n3x!
@sph1n3x I've been trying to just get a tflite with uint8, currently I'm not so worried about performance, I just want it to run with the same infra as my yolov5 models. Could you share the code you used for the conversion and detection with the result? I have been able to export to onnx and then with https://github.com/MPolaris/onnx2tflite convert it into a tflite with uint8 but I can't make sense of the output from the resulting model. Samething when I used https://github.com/PINTO0309/onnx2tf, can't make sense of the output.
For anyone still looking for a solution - I was able to convert the YOLOv7 tiny model to an Edge TPU compatible tflite format with a resolution of 640 via the openvino2tensorflow converter. Almost all operations were mapped to the Edge TPU (see output below), while with onnx2tf it was the other way round. Unfortunately, that's where the fun ends as I am not faced with an error when running the model on my Raspberry Pi 4 F driver/usb/usb_driver.cc:857] transfer on tag 1 failed. Abort. Deadline exceeded: USB transfer error 2 [LibUsbDataOutCallback]
. I tend to believe that this isn't caused by insufficient power delivery from the Pi either as it should be able to output 1200mA total across all ports and I have nothing else plugged in.
Edge TPU Compiler version 16.0.384591198
Searching for valid delegate with step 1
Try to compile segment with 330 ops
Started a compilation timeout timer of 3600 seconds.
Model compiled successfully in 40666 ms.
Input model: saved_model/model_full_integer_quant.tflite
Input size: 6.01MiB
Output model: saved_model/model_full_integer_quant_edgetpu.tflite
Output size: 6.42MiB
On-chip memory used for caching model parameters: 5.90MiB
On-chip memory remaining for caching model parameters: 659.50KiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 2
Total number of operations: 330
Operation log: saved_model/model_full_integer_quant_edgetpu.log
Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 321
Number of operations that will run on CPU: 9
Operator Count Status
ADD 55 Mapped to Edge TPU
MAX_POOL_2D 6 Mapped to Edge TPU
CONV_2D 58 Mapped to Edge TPU
RESIZE_NEAREST_NEIGHBOR 2 Mapped to Edge TPU
CONCATENATION 14 Mapped to Edge TPU
RELU 55 Mapped to Edge TPU
MUL 55 Mapped to Edge TPU
MIRROR_PAD 3 Operation not supported
RESHAPE 3 Tensor has unsupported rank (up to 3 innermost dimensions mapped)
TRANSPOSE 3 Tensor has unsupported rank (up to 3 innermost dimensions mapped)
PAD 21 Mapped to Edge TPU
MINIMUM 55 Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded!
Update: Using a resolution of 512 resulted in a different error: KeyError: 'output_0
. But at least it looks like it is trying to run it now?
Update update: I was able to reproduce this on a Windows machine too, perhaps the model output is simply too big.
Hello @35grain , what was your onnx2tf command? And which model were able to convert?
Hello @35grain , what was your onnx2tf command? And which model were able to convert?
@hardikdava I used the following workflow with openvino2tensorflow: YOLOv7-tiny custom trained model > ONNX > OpenVINO > tflite int8. Here's a code snippet you could modify for your use (I ran it in Colab):
pip install -r requirements.txt # for using the export command from YOLOv7 repo
pip install openvino-dev
pip install openvino2tensorflow
pip install onnx onnxsim # onnxsim for simplifying the model using the export command (optional)
python export.py --weights '/path/to/model.pt' --simplify
mo --input_model '/path/to/model.onnx'
openvino2tensorflow --model_path '/path/to/model.xml' --output_edgetpu
Let me know if you have any luck with getting it running. Currently only YOLOv5n and v5n6 are working for me (though with lower accuracy) while YOLOv7 is in the described state and YOLOv8 has its own issues with exporting. I really just need something usable for my project.
Hello, I was able to export yolov7-tiny.pt to edgetpu. But there is a limitations for numbers of class and image size. But yolov7-tiny was able to export as edgetpu and all the ops are compiled successfully. I made my commit to this #1672 pr. It is made under branch u5
. Please find below logs for complete export of model.
TensorFlow SavedModel: starting export with tensorflow 2.12.0...
from n params module arguments
2023-04-24 18:23:29.086379: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
0 -1 1 928 models.common.Conv [3, 32, 3, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 2112 models.common.Conv [64, 32, 1, 1]
3 -2 1 2112 models.common.Conv [64, 32, 1, 1]
4 -1 1 9280 models.common.Conv [32, 32, 3, 1]
5 -1 1 9280 models.common.Conv [32, 32, 3, 1]
6 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
7 -1 1 8320 models.common.Conv [128, 64, 1, 1]
8 -1 1 0 models.common.MP []
9 -1 1 4224 models.common.Conv [64, 64, 1, 1]
10 -2 1 4224 models.common.Conv [64, 64, 1, 1]
11 -1 1 36992 models.common.Conv [64, 64, 3, 1]
12 -1 1 36992 models.common.Conv [64, 64, 3, 1]
13 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 models.common.MP []
16 -1 1 16640 models.common.Conv [128, 128, 1, 1]
17 -2 1 16640 models.common.Conv [128, 128, 1, 1]
18 -1 1 147712 models.common.Conv [128, 128, 3, 1]
19 -1 1 147712 models.common.Conv [128, 128, 3, 1]
20 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
21 -1 1 131584 models.common.Conv [512, 256, 1, 1]
22 -1 1 0 models.common.MP []
23 -1 1 66048 models.common.Conv [256, 256, 1, 1]
24 -2 1 66048 models.common.Conv [256, 256, 1, 1]
25 -1 1 590336 models.common.Conv [256, 256, 3, 1]
26 -1 1 590336 models.common.Conv [256, 256, 3, 1]
27 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
28 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
29 -1 1 131584 models.common.Conv [512, 256, 1, 1]
30 -2 1 131584 models.common.Conv [512, 256, 1, 1]
31 -1 1 0 models.common.SP [5]
32 -2 1 0 models.common.SP [9]
33 -3 1 0 models.common.SP [13]
34 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
35 -1 1 262656 models.common.Conv [1024, 256, 1, 1]
36 [-1, -7] 1 0 models.common.Concat [1]
37 -1 1 131584 models.common.Conv [512, 256, 1, 1]
38 -1 1 33024 models.common.Conv [256, 128, 1, 1]
39 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
40 21 1 65792 models.common.Conv [512, 128, 1, 1]
41 [-1, -2] 1 0 models.common.Concat [1]
42 -1 1 16512 models.common.Conv [256, 64, 1, 1]
43 -2 1 16512 models.common.Conv [256, 64, 1, 1]
44 -1 1 36992 models.common.Conv [64, 64, 3, 1]
45 -1 1 36992 models.common.Conv [64, 64, 3, 1]
46 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
47 -1 1 33024 models.common.Conv [256, 128, 1, 1]
48 -1 1 8320 models.common.Conv [128, 64, 1, 1]
49 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
50 14 1 16512 models.common.Conv [256, 64, 1, 1]
51 [-1, -2] 1 0 models.common.Concat [1]
52 -1 1 4160 models.common.Conv [128, 32, 1, 1]
53 -2 1 4160 models.common.Conv [128, 32, 1, 1]
54 -1 1 9280 models.common.Conv [32, 32, 3, 1]
55 -1 1 9280 models.common.Conv [32, 32, 3, 1]
56 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
57 -1 1 8320 models.common.Conv [128, 64, 1, 1]
58 -1 1 73984 models.common.Conv [64, 128, 3, 2]
59 [-1, 47] 1 0 models.common.Concat [1]
60 -1 1 16512 models.common.Conv [256, 64, 1, 1]
61 -2 1 16512 models.common.Conv [256, 64, 1, 1]
62 -1 1 36992 models.common.Conv [64, 64, 3, 1]
63 -1 1 36992 models.common.Conv [64, 64, 3, 1]
64 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
65 -1 1 33024 models.common.Conv [256, 128, 1, 1]
66 -1 1 295424 models.common.Conv [128, 256, 3, 2]
67 [-1, 37] 1 0 models.common.Concat [1]
68 -1 1 65792 models.common.Conv [512, 128, 1, 1]
69 -2 1 65792 models.common.Conv [512, 128, 1, 1]
70 -1 1 147712 models.common.Conv [128, 128, 3, 1]
71 -1 1 147712 models.common.Conv [128, 128, 3, 1]
72 [-1, -2, -3, -4] 1 0 models.common.Concat [1]
73 -1 1 131584 models.common.Conv [512, 256, 1, 1]
74 57 1 147712 models.common.Conv [128, 128, 3, 1]
75 65 1 590336 models.common.Conv [256, 256, 3, 1]
76 73 1 2360320 models.common.Conv [512, 512, 3, 1]
77 [74, 75, 76] 1 21576 models.yolo.Detect [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512], [416, 416]]
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(1, 416, 416, 3)] 0 []
tf_conv (TFConv) (1, 208, 208, 32) 896 ['input_1[0][0]']
tf_conv_1 (TFConv) (1, 104, 104, 64) 18496 ['tf_conv[0][0]']
tf_conv_3 (TFConv) (1, 104, 104, 32) 2080 ['tf_conv_1[0][0]']
tf_conv_4 (TFConv) (1, 104, 104, 32) 9248 ['tf_conv_3[0][0]']
tf_conv_5 (TFConv) (1, 104, 104, 32) 9248 ['tf_conv_4[0][0]']
tf_conv_2 (TFConv) (1, 104, 104, 32) 2080 ['tf_conv_1[0][0]']
tf_concat (TFConcat) (1, 104, 104, 128) 0 ['tf_conv_5[0][0]',
'tf_conv_4[0][0]',
'tf_conv_3[0][0]',
'tf_conv_2[0][0]']
tf_conv_6 (TFConv) (1, 104, 104, 64) 8256 ['tf_concat[0][0]']
tfmp (TFMP) (1, 52, 52, 64) 0 ['tf_conv_6[0][0]']
tf_conv_8 (TFConv) (1, 52, 52, 64) 4160 ['tfmp[0][0]']
tf_conv_9 (TFConv) (1, 52, 52, 64) 36928 ['tf_conv_8[0][0]']
tf_conv_10 (TFConv) (1, 52, 52, 64) 36928 ['tf_conv_9[0][0]']
tf_conv_7 (TFConv) (1, 52, 52, 64) 4160 ['tfmp[0][0]']
tf_concat_1 (TFConcat) (1, 52, 52, 256) 0 ['tf_conv_10[0][0]',
'tf_conv_9[0][0]',
'tf_conv_8[0][0]',
'tf_conv_7[0][0]']
tf_conv_11 (TFConv) (1, 52, 52, 128) 32896 ['tf_concat_1[0][0]']
tfmp_1 (TFMP) (1, 26, 26, 128) 0 ['tf_conv_11[0][0]']
tf_conv_13 (TFConv) (1, 26, 26, 128) 16512 ['tfmp_1[0][0]']
tf_conv_14 (TFConv) (1, 26, 26, 128) 147584 ['tf_conv_13[0][0]']
tf_conv_15 (TFConv) (1, 26, 26, 128) 147584 ['tf_conv_14[0][0]']
tf_conv_12 (TFConv) (1, 26, 26, 128) 16512 ['tfmp_1[0][0]']
tf_concat_2 (TFConcat) (1, 26, 26, 512) 0 ['tf_conv_15[0][0]',
'tf_conv_14[0][0]',
'tf_conv_13[0][0]',
'tf_conv_12[0][0]']
tf_conv_16 (TFConv) (1, 26, 26, 256) 131328 ['tf_concat_2[0][0]']
tfmp_2 (TFMP) (1, 13, 13, 256) 0 ['tf_conv_16[0][0]']
tf_conv_18 (TFConv) (1, 13, 13, 256) 65792 ['tfmp_2[0][0]']
tf_conv_19 (TFConv) (1, 13, 13, 256) 590080 ['tf_conv_18[0][0]']
tf_conv_20 (TFConv) (1, 13, 13, 256) 590080 ['tf_conv_19[0][0]']
tf_conv_17 (TFConv) (1, 13, 13, 256) 65792 ['tfmp_2[0][0]']
tf_concat_3 (TFConcat) (1, 13, 13, 1024) 0 ['tf_conv_20[0][0]',
'tf_conv_19[0][0]',
'tf_conv_18[0][0]',
'tf_conv_17[0][0]']
tf_conv_21 (TFConv) (1, 13, 13, 512) 524800 ['tf_concat_3[0][0]']
tf_conv_23 (TFConv) (1, 13, 13, 256) 131328 ['tf_conv_21[0][0]']
tfsp_2 (TFSP) (1, 13, 13, 256) 0 ['tf_conv_23[0][0]']
tfsp_1 (TFSP) (1, 13, 13, 256) 0 ['tf_conv_23[0][0]']
tfsp (TFSP) (1, 13, 13, 256) 0 ['tf_conv_23[0][0]']
tf_concat_4 (TFConcat) (1, 13, 13, 1024) 0 ['tfsp_2[0][0]',
'tfsp_1[0][0]',
'tfsp[0][0]',
'tf_conv_23[0][0]']
tf_conv_24 (TFConv) (1, 13, 13, 256) 262400 ['tf_concat_4[0][0]']
tf_conv_22 (TFConv) (1, 13, 13, 256) 131328 ['tf_conv_21[0][0]']
tf_concat_5 (TFConcat) (1, 13, 13, 512) 0 ['tf_conv_24[0][0]',
'tf_conv_22[0][0]']
tf_conv_25 (TFConv) (1, 13, 13, 256) 131328 ['tf_concat_5[0][0]']
tf_conv_26 (TFConv) (1, 13, 13, 128) 32896 ['tf_conv_25[0][0]']
tf_conv_27 (TFConv) (1, 26, 26, 128) 32896 ['tf_conv_16[0][0]']
tf_upsample (TFUpsample) (1, 26, 26, 128) 0 ['tf_conv_26[0][0]']
tf_concat_6 (TFConcat) (1, 26, 26, 256) 0 ['tf_conv_27[0][0]',
'tf_upsample[0][0]']
tf_conv_29 (TFConv) (1, 26, 26, 64) 16448 ['tf_concat_6[0][0]']
tf_conv_30 (TFConv) (1, 26, 26, 64) 36928 ['tf_conv_29[0][0]']
tf_conv_31 (TFConv) (1, 26, 26, 64) 36928 ['tf_conv_30[0][0]']
tf_conv_28 (TFConv) (1, 26, 26, 64) 16448 ['tf_concat_6[0][0]']
tf_concat_7 (TFConcat) (1, 26, 26, 256) 0 ['tf_conv_31[0][0]',
'tf_conv_30[0][0]',
'tf_conv_29[0][0]',
'tf_conv_28[0][0]']
tf_conv_32 (TFConv) (1, 26, 26, 128) 32896 ['tf_concat_7[0][0]']
tf_conv_33 (TFConv) (1, 26, 26, 64) 8256 ['tf_conv_32[0][0]']
tf_conv_34 (TFConv) (1, 52, 52, 64) 8256 ['tf_conv_11[0][0]']
tf_upsample_1 (TFUpsample) (1, 52, 52, 64) 0 ['tf_conv_33[0][0]']
tf_concat_8 (TFConcat) (1, 52, 52, 128) 0 ['tf_conv_34[0][0]',
'tf_upsample_1[0][0]']
tf_conv_36 (TFConv) (1, 52, 52, 32) 4128 ['tf_concat_8[0][0]']
tf_conv_37 (TFConv) (1, 52, 52, 32) 9248 ['tf_conv_36[0][0]']
tf_conv_38 (TFConv) (1, 52, 52, 32) 9248 ['tf_conv_37[0][0]']
tf_conv_35 (TFConv) (1, 52, 52, 32) 4128 ['tf_concat_8[0][0]']
tf_concat_9 (TFConcat) (1, 52, 52, 128) 0 ['tf_conv_38[0][0]',
'tf_conv_37[0][0]',
'tf_conv_36[0][0]',
'tf_conv_35[0][0]']
tf_conv_39 (TFConv) (1, 52, 52, 64) 8256 ['tf_concat_9[0][0]']
tf_conv_40 (TFConv) (1, 26, 26, 128) 73856 ['tf_conv_39[0][0]']
tf_concat_10 (TFConcat) (1, 26, 26, 256) 0 ['tf_conv_40[0][0]',
'tf_conv_32[0][0]']
tf_conv_42 (TFConv) (1, 26, 26, 64) 16448 ['tf_concat_10[0][0]']
tf_conv_43 (TFConv) (1, 26, 26, 64) 36928 ['tf_conv_42[0][0]']
tf_conv_44 (TFConv) (1, 26, 26, 64) 36928 ['tf_conv_43[0][0]']
tf_conv_41 (TFConv) (1, 26, 26, 64) 16448 ['tf_concat_10[0][0]']
tf_concat_11 (TFConcat) (1, 26, 26, 256) 0 ['tf_conv_44[0][0]',
'tf_conv_43[0][0]',
'tf_conv_42[0][0]',
'tf_conv_41[0][0]']
tf_conv_45 (TFConv) (1, 26, 26, 128) 32896 ['tf_concat_11[0][0]']
tf_conv_46 (TFConv) (1, 13, 13, 256) 295168 ['tf_conv_45[0][0]']
tf_concat_12 (TFConcat) (1, 13, 13, 512) 0 ['tf_conv_46[0][0]',
'tf_conv_25[0][0]']
tf_conv_48 (TFConv) (1, 13, 13, 128) 65664 ['tf_concat_12[0][0]']
tf_conv_49 (TFConv) (1, 13, 13, 128) 147584 ['tf_conv_48[0][0]']
tf_conv_50 (TFConv) (1, 13, 13, 128) 147584 ['tf_conv_49[0][0]']
tf_conv_47 (TFConv) (1, 13, 13, 128) 65664 ['tf_concat_12[0][0]']
tf_concat_13 (TFConcat) (1, 13, 13, 512) 0 ['tf_conv_50[0][0]',
'tf_conv_49[0][0]',
'tf_conv_48[0][0]',
'tf_conv_47[0][0]']
tf_conv_51 (TFConv) (1, 13, 13, 256) 131328 ['tf_concat_13[0][0]']
tf_conv_52 (TFConv) (1, 52, 52, 128) 73856 ['tf_conv_39[0][0]']
tf_conv_53 (TFConv) (1, 26, 26, 256) 295168 ['tf_conv_45[0][0]']
tf_conv_54 (TFConv) (1, 13, 13, 512) 1180160 ['tf_conv_51[0][0]']
tf_detect (TFDetect) ((1, 10647, 8), 21576 ['tf_conv_52[0][0]',
) 'tf_conv_53[0][0]',
'tf_conv_54[0][0]']
==================================================================================================
Total params: 6,012,040
Trainable params: 0
Non-trainable params: 6,012,040
__________________________________________________________________________________________________
TensorFlow SavedModel: export success ✅ 8.6s, saved as runs/train/exp8/weights/best_saved_model (23.1 MB)
TensorFlow Lite: starting export with tensorflow 2.12.0...
WARNING:absl:Found untraced functions such as conv2d_3_layer_call_fn, conv2d_3_layer_call_and_return_conditional_losses, _jit_compiled_convolution_op, conv2d_4_layer_call_fn, conv2d_4_layer_call_and_return_conditional_losses while saving (showing 5 of 172). These functions will not be directly callable after loading.
fully_quantize: 0, inference_type: 6, input_inference_type: UINT8, output_inference_type: UINT8
TensorFlow Lite: export success ✅ 105.0s, saved as runs/train/exp8/weights/best-int8.tflite (6.0 MB)
Edge TPU: starting export with Edge TPU compiler 16.0.384591198...
Edge TPU Compiler version 16.0.384591198
Searching for valid delegate with step 10
Try to compile segment with 246 ops
Started a compilation timeout timer of 180 seconds.
Model compiled successfully in 3133 ms.
Input model: runs/train/exp8/weights/best-int8.tflite
Input size: 6.00MiB
Output model: runs/train/exp8/weights/best-int8_edgetpu.tflite
Output size: 6.47MiB
On-chip memory used for caching model parameters: 5.90MiB
On-chip memory remaining for caching model parameters: 1.16MiB
Off-chip memory used for streaming uncached model parameters: 41.75KiB
Number of Edge TPU subgraphs: 1
Total number of operations: 246
Operation log: runs/train/exp8/weights/best-int8_edgetpu.log
Operator Count Status
ADD 3 Mapped to Edge TPU
RESHAPE 6 Mapped to Edge TPU
MAX_POOL_2D 4 Mapped to Edge TPU
CONCATENATION 18 Mapped to Edge TPU
PAD 4 Mapped to Edge TPU
LOGISTIC 64 Mapped to Edge TPU
RESIZE_NEAREST_NEIGHBOR 2 Mapped to Edge TPU
QUANTIZE 5 Mapped to Edge TPU
MUL 73 Mapped to Edge TPU
STRIDED_SLICE 9 Mapped to Edge TPU
CONV_2D 58 Mapped to Edge TPU
Compilation child process completed within timeout period.
Compilation succeeded!
Edge TPU: export success ✅ 3.4s, saved as runs/train/exp8/weights/best-int8_edgetpu.tflite (6.5 MB)
Hello, I was able to export yolov7-tiny.pt to edgetpu. But there is a limitations for numbers of class and image size. But yolov7-tiny was able to export as edgetpu and all the ops are compiled successfully. I made my commit to this #1672 pr. It is made under
branch u5
. Please find below logs for complete export of model.TensorFlow SavedModel: starting export with tensorflow 2.12.0... from n params module arguments 2023-04-24 18:23:29.086379: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: ... Operator Count Status ADD 3 Mapped to Edge TPU RESHAPE 6 Mapped to Edge TPU MAX_POOL_2D 4 Mapped to Edge TPU CONCATENATION 18 Mapped to Edge TPU PAD 4 Mapped to Edge TPU LOGISTIC 64 Mapped to Edge TPU RESIZE_NEAREST_NEIGHBOR 2 Mapped to Edge TPU QUANTIZE 5 Mapped to Edge TPU MUL 73 Mapped to Edge TPU STRIDED_SLICE 9 Mapped to Edge TPU CONV_2D 58 Mapped to Edge TPU Compilation child process completed within timeout period. Compilation succeeded! Edge TPU: export success ✅ 3.4s, saved as runs/train/exp8/weights/best-int8_edgetpu.tflite (6.5 MB)
Yeah, it exported for me too but did you try running it as well?
hello @35grain, int8 tflite is detecting fine. It has also the same issue of accuracy drop after quantization.
The pull request appears to be for a really old branch..
What model can be runned in Google Coral TPU Accelerator? Is it there a snip code to transform into model_quantized.tflite?