THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
https://arxiv.org/abs/2405.14458
GNU Affero General Public License v3.0
9.6k stars 920 forks source link

TFLite Support #161

Open Boulaouaney opened 4 months ago

Boulaouaney commented 4 months ago

Wondering if there are any plans for stronger TFLite support.

From searching I see there are already other people interested in this (at least one person #141)

If necessary, I am willing to open a PR about this. I was already successful to convert YOLOv10n model to TFLite using onnx2tf, nobuco, and google ai-edge-torch with varying degreed of success.

However, after testing my converted models using the Tensorflow Benchmark Tool I noticed poor performance of YOLOv10 on GPU and NNAPI (tested on Pixel 8 Pro Edge TPU). Where a big portion of the resulting graph isn't supported by the GPU and NNAPI Delegates, and instead passed on to be executed by the XNNPACK Delegate. Despite my experience working with deploying models on edge devices, I think there would be someone more experienced and/or can dedicate more time to look into ways to solve this issue.

When compared to YOLOv8 or YOLOv5 by Ultralytics, they can be deployed and run much faster on mobile devices (using TFLite) because they fully support GPU and NNAPI delegates.

For now, I am able to opening a PR to add functionality to convert YOLOv10 models to TFLite without much optimization.

AndreaBrg commented 4 months ago

Hi @Boulaouaney, thanks for your work. So, if I understand correctly, the current YOLOv10 implementation doesn't support TFLite conversion, and YOLOv8 is still faster using TFLite?

Boulaouaney commented 4 months ago

Hi @AndreaBrg

Currently using the repo as is, exporting to TF models fails (using onnx2tf). If you export to onnx first then manually use onnx2tf without -nuo flag, the export works.

However, when tested on mobile using GPU Delegate, it is much slower than v5 or v8 because the model graph isn't executed completely using GPU Delegate. Instead, it is split and most of the operations run on CPU. When tested yolov10n_float16.tflite with TF Benchmark Tool, it displays the following message: 126 operations will run on the GPU, and the remaining 381 operations will run on the CPU. I will include full screenshots of the results from testing yolov10n vs yolov5n for reference.

I suspect that one of the culprits is v10postprocess() that's included in the end2end onnx export. I tried to get around it by not including it, but still shows the same results (it's possible I didn't do it correctly https://github.com/THU-MIG/yolov10/blob/e6d80f3fa6d1278a2b3987863df5aca53a3366d4/ultralytics/utils/ops.py#L851-L864

Also, the classic YOLOv8 Detect head had some workarounds in place when exporting to tflite to avoid Flex Ops https://github.com/THU-MIG/yolov10/blob/e6d80f3fa6d1278a2b3987863df5aca53a3366d4/ultralytics/nn/modules/head.py#L53

YOLOv10n-float16.tflite on GPU:

image

YOLOv5n-float16.tflite on GPU:

image

AndreaBrg commented 4 months ago

@Boulaouaney Thanks for the detailed reply!

robertomancebom commented 3 months ago

Hi @Boulaouaney I have tried exporting from onnx to tflite using:

!onnx2tf -i "yolov10s.onnx" -o "yolov10s_exported"

However, I am getting the following error:

ValueError: Exception encountered when calling layer "tf.tile_36" (type TFOpLambda).

Shape must be rank 3 but is rank 1 for '{{node tf.tile_36/Tile}} = Tile[T=DT_INT64, Tmultiples=DT_INT64](Placeholder, tf.tile_36/Tile/multiples)' with input shapes: [1,300,1], [1].

Call arguments received by layer "tf.tile_36" (type TFOpLambda):
  • input=tf.Tensor(shape=(1, 300, 1), dtype=int64)
  • multiples=array([1])
  • name='/model.23/Tile'

Could you provide some guidance in the export process to tflite?

UPDATE While exporting to ONNX I specified the opset and now I do not get the errors mentioned before:

model.export(format="onnx", imgsz=(1024, 1024), opset=17, simplify=True)

rabion1234 commented 3 months ago

hey @Boulaouaney,

could you provide code for transfer onnx model to tflite because when using code: import onnx from onnx_tf.backend import prepare

Load the ONNX model

onnx_model = onnx.load("path/to/onnx/model")

Convert the ONNX model to TensorFlow

tf_rep = prepare(onnx_model)

Export the model as a TensorFlow saved model

tf_rep.export_graph("path/to/save/model") error: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in <cell line: 2>() 1 import onnx ----> 2 from onnx_tf.backend import prepare 3 4 # Load the ONNX model 5 onnx_model = onnx.load("/content/drive/MyDrive/CamScanner/yolov10/runs/detect/train/weights/best.onnx")

15 frames /usr/local/lib/python3.10/dist-packages/tensorflow_probability/python/layers/distribution_layer.py in 66 67 ---> 68 tf.keras.internal.utils.register_symbolic_tensor_type(dtc._TensorCoercible) # pylint: disable=protected-access 69 70

AttributeError: module 'keras._tf_keras.keras' has no attribute 'internal'

could you provide your conversion code

bluesy7585 commented 3 months ago

a possible workaround is to exclude post process layers in exported model. in head.py, remove post process

            if not self.export:
                return {"one2many": one2many, "one2one": one2one}
            else:
                assert(self.max_det != -1)
                #boxes, scores, labels = ops.v10postprocess(one2one.permute(0, 2, 1), self.max_det, self.nc)
                #return torch.cat([boxes, scores.unsqueeze(-1), labels.unsqueeze(-1).to(boxes.dtype)], dim=-1)
                return one2one

export onnx model, then use onnx2tf to convert onnx to tflite.

onnx2tf -i ./best_yolov10_no_post.onnx -o ./best_saved_model -nuo --non_verbose

you need to implement post process in your tflite inference code.

bluesy7585 commented 3 months ago

Hi @Boulaouaney I have tried exporting from onnx to tflite using:

!onnx2tf -i "yolov10s.onnx" -o "yolov10s_exported"

However, I am getting the following error:

ValueError: Exception encountered when calling layer "tf.tile_36" (type TFOpLambda).

Shape must be rank 3 but is rank 1 for '{{node tf.tile_36/Tile}} = Tile[T=DT_INT64, Tmultiples=DT_INT64](Placeholder, tf.tile_36/Tile/multiples)' with input shapes: [1,300,1], [1].

Call arguments received by layer "tf.tile_36" (type TFOpLambda):
  • input=tf.Tensor(shape=(1, 300, 1), dtype=int64)
  • multiples=array([1])
  • name='/model.23/Tile'

Could you provide some guidance in the export process to tflite?

UPDATE While exporting to ONNX I specified the opset and now I do not get the errors mentioned before:

model.export(format="onnx", imgsz=(1024, 1024), opset=17, simplify=True)

I was using onnx2tf 1.17.5 previously, had this issue just tested with onnx2tf 1.22.6. it works without error. I think this issue is fixed in recent onnx2tf release.

Boulaouaney commented 3 months ago

Hey @rabion1234! sorry for the late reply.

I see you are trying to use onnx-tf, I wouldn't recommend using it as it is not being maintained and planned for deprecation. Instead you could use onnx2tf after exporting to onnx using the provided export function in the repo. Also, as other comments have mentioned and I should have mentioned at the start, make sure to specify opset=17 at least (I did mine with 18) when exporting to onnx. Also use a more recent version of onnx2tf.

An alternative to skip onnx is using nobuco to export directly from pytorch to tflite.

blackCmd commented 3 weeks ago

Is Yolov10.tflite still not supported on GPU and NNAPI Delegate? If so, it seems like there's no value in using yolov10 on Android devices. Could onnxruntime be a good alternative?

Thank you for sharing the wonderful experiments and results.