Open Boulaouaney opened 4 months ago
Hi @Boulaouaney, thanks for your work. So, if I understand correctly, the current YOLOv10 implementation doesn't support TFLite conversion, and YOLOv8 is still faster using TFLite?
Hi @AndreaBrg
Currently using the repo as is, exporting to TF models fails (using onnx2tf). If you export to onnx first then manually use onnx2tf without -nuo
flag, the export works.
However, when tested on mobile using GPU Delegate, it is much slower than v5 or v8 because the model graph isn't executed completely using GPU Delegate. Instead, it is split and most of the operations run on CPU. When tested yolov10n_float16.tflite
with TF Benchmark Tool, it displays the following message: 126 operations will run on the GPU, and the remaining 381 operations will run on the CPU.
I will include full screenshots of the results from testing yolov10n vs yolov5n for reference.
I suspect that one of the culprits is v10postprocess()
that's included in the end2end onnx export. I tried to get around it by not including it, but still shows the same results (it's possible I didn't do it correctly
https://github.com/THU-MIG/yolov10/blob/e6d80f3fa6d1278a2b3987863df5aca53a3366d4/ultralytics/utils/ops.py#L851-L864
Also, the classic YOLOv8 Detect
head had some workarounds in place when exporting to tflite to avoid Flex Ops
https://github.com/THU-MIG/yolov10/blob/e6d80f3fa6d1278a2b3987863df5aca53a3366d4/ultralytics/nn/modules/head.py#L53
@Boulaouaney Thanks for the detailed reply!
Hi @Boulaouaney I have tried exporting from onnx to tflite using:
!onnx2tf -i "yolov10s.onnx" -o "yolov10s_exported"
However, I am getting the following error:
ValueError: Exception encountered when calling layer "tf.tile_36" (type TFOpLambda).
Shape must be rank 3 but is rank 1 for '{{node tf.tile_36/Tile}} = Tile[T=DT_INT64, Tmultiples=DT_INT64](Placeholder, tf.tile_36/Tile/multiples)' with input shapes: [1,300,1], [1].
Call arguments received by layer "tf.tile_36" (type TFOpLambda):
• input=tf.Tensor(shape=(1, 300, 1), dtype=int64)
• multiples=array([1])
• name='/model.23/Tile'
Could you provide some guidance in the export process to tflite?
UPDATE While exporting to ONNX I specified the opset and now I do not get the errors mentioned before:
model.export(format="onnx", imgsz=(1024, 1024), opset=17, simplify=True)
hey @Boulaouaney,
could you provide code for transfer onnx model to tflite because when using code: import onnx from onnx_tf.backend import prepare
onnx_model = onnx.load("path/to/onnx/model")
tf_rep = prepare(onnx_model)
tf_rep.export_graph("path/to/save/model") error: ---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
15 frames
/usr/local/lib/python3.10/dist-packages/tensorflow_probability/python/layers/distribution_layer.py in
AttributeError: module 'keras._tf_keras.keras' has no attribute 'internal'
could you provide your conversion code
a possible workaround is to exclude post process layers in exported model. in head.py, remove post process
if not self.export:
return {"one2many": one2many, "one2one": one2one}
else:
assert(self.max_det != -1)
#boxes, scores, labels = ops.v10postprocess(one2one.permute(0, 2, 1), self.max_det, self.nc)
#return torch.cat([boxes, scores.unsqueeze(-1), labels.unsqueeze(-1).to(boxes.dtype)], dim=-1)
return one2one
export onnx model, then use onnx2tf to convert onnx to tflite.
onnx2tf -i ./best_yolov10_no_post.onnx -o ./best_saved_model -nuo --non_verbose
you need to implement post process in your tflite inference code.
Hi @Boulaouaney I have tried exporting from onnx to tflite using:
!onnx2tf -i "yolov10s.onnx" -o "yolov10s_exported"
However, I am getting the following error:
ValueError: Exception encountered when calling layer "tf.tile_36" (type TFOpLambda). Shape must be rank 3 but is rank 1 for '{{node tf.tile_36/Tile}} = Tile[T=DT_INT64, Tmultiples=DT_INT64](Placeholder, tf.tile_36/Tile/multiples)' with input shapes: [1,300,1], [1]. Call arguments received by layer "tf.tile_36" (type TFOpLambda): • input=tf.Tensor(shape=(1, 300, 1), dtype=int64) • multiples=array([1]) • name='/model.23/Tile'
Could you provide some guidance in the export process to tflite?
UPDATE While exporting to ONNX I specified the opset and now I do not get the errors mentioned before:
model.export(format="onnx", imgsz=(1024, 1024), opset=17, simplify=True)
I was using onnx2tf 1.17.5 previously, had this issue just tested with onnx2tf 1.22.6. it works without error. I think this issue is fixed in recent onnx2tf release.
Hey @rabion1234! sorry for the late reply.
I see you are trying to use onnx-tf
, I wouldn't recommend using it as it is not being maintained and planned for deprecation. Instead you could use onnx2tf
after exporting to onnx using the provided export function in the repo. Also, as other comments have mentioned and I should have mentioned at the start, make sure to specify opset=17
at least (I did mine with 18) when exporting to onnx. Also use a more recent version of onnx2tf.
An alternative to skip onnx is using nobuco
to export directly from pytorch to tflite.
Is Yolov10.tflite still not supported on GPU and NNAPI Delegate? If so, it seems like there's no value in using yolov10 on Android devices. Could onnxruntime be a good alternative?
Thank you for sharing the wonderful experiments and results.
Wondering if there are any plans for stronger TFLite support.
From searching I see there are already other people interested in this (at least one person #141)
If necessary, I am willing to open a PR about this. I was already successful to convert YOLOv10n model to TFLite using onnx2tf, nobuco, and google ai-edge-torch with varying degreed of success.
However, after testing my converted models using the Tensorflow Benchmark Tool I noticed poor performance of YOLOv10 on GPU and NNAPI (tested on Pixel 8 Pro Edge TPU). Where a big portion of the resulting graph isn't supported by the GPU and NNAPI Delegates, and instead passed on to be executed by the XNNPACK Delegate. Despite my experience working with deploying models on edge devices, I think there would be someone more experienced and/or can dedicate more time to look into ways to solve this issue.
When compared to YOLOv8 or YOLOv5 by Ultralytics, they can be deployed and run much faster on mobile devices (using TFLite) because they fully support GPU and NNAPI delegates.
For now, I am able to opening a PR to add functionality to convert YOLOv10 models to TFLite without much optimization.