Closed kyakuno closed 2 years ago
@kyakuno Which model(s) (nano, tiny,...) would you like to export? And for the representative dataset, what dataset should I use (a few hundreds images from COCO dataset or something else)?
Please export tiny. Please use COCO2017 for representative dataset.
@zhaochow Can you write tutorial here how to export tflite from original repository?
@zhaochow Can you write tutorial here how to export tflite from original repository?
Yes sure let me write a small summary of it
Overview: ONNX -> TensorFlow -> TensorFlow Lite
The ONNX models for YOLOX can be found here https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/ONNXRuntime
Dependencies:
onnx-tf
tensorflow
protobuf
To convert, run onnx-tf convert -i /path/to/input.onnx -o /path/to/output
. Output will be in the SavedModel format (directory)
e.g. onnx-tf convert -i yolox_tiny.onnx -o yolox_tiny
Dependencies:
tensorflow
A small sized representative dataset is necessary for the full integer quantization. Usually, 100~500 images from the training or validation set is enough.
The code below is an example to convert the SavedModel yolox_tiny
to yolox_tiny.tflite
. It uses 300 images from the COCO 2017 validation set (300 filepaths were previously randomly selected and written to a csv
file).
import os
import cv2
import numpy as np
import pandas as pd
import tensorflow as tf
tf_model_path = 'yolox_tiny'
def representative_data_gen():
data_dir = 'coco/'
samples_paths = pd.read_csv(os.path.join(data_dir, 'val2017_300samples.csv'), squeeze=True)
samples_paths = [os.path.join(data_dir, 'val2017', x) for x in samples_paths]
# Preprocessing
# https://github.com/Megvii-BaseDetection/YOLOX/blob/main/demo/ONNXRuntime/onnx_inference.py
def preproc(img, input_size, swap=(2, 0, 1)):
if len(img.shape) == 3:
padded_img = np.ones((input_size[0], input_size[1], 3), dtype=np.uint8) * 114
else:
padded_img = np.ones(input_size, dtype=np.uint8) * 114
r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
resized_img = cv2.resize(
img,
(int(img.shape[1] * r), int(img.shape[0] * r)),
interpolation=cv2.INTER_LINEAR,
).astype(np.uint8)
padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img
padded_img = padded_img.transpose(swap)
padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
return padded_img, r
samples = np.asarray([
preproc(cv2.imread(x), (416, 416))[0] for x in samples_paths
])
for input_value in tf.data.Dataset.from_tensor_slices(samples).batch(1):
yield [input_value]
converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to uint8 (APIs added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()
tflite_path = f'{tf_model_path}_full_integer_quant.tflite'
with open(tflite_path, 'wb') as f:
f.write(tflite_model_quant)
@zhaochow Thanks!
Hello ,
can you share YOLOX results using TensorFlow Lite ?
@AouatifZ You can use official tensorflow runtime using below command.
cd yolox
python3 yolox.py --tflite
https://github.com/axinc-ai/ailia-models-tflite/tree/main/object_detection/yolox
And you can find converted model here. https://netron.app/?url=https://storage.googleapis.com/ailia-models-tflite/yolox/yolox_tiny_full_integer_quant.opt.tflite
@kyakuno Thanks a lot for your help
@kyakuno please can you share your result of inference time using Integer quantization for YOLOX
because in my case the optimization by Only Integer qauntization gives bad results !?
In general, quantization does not improve inference speed in tensorflow lite CPU and GPU inference. Quantization works well when using an NPU.
The gemm of conv becomes int8, but multiplication is required when converting int32 resulting from gemm to int8. Also, add requires scaling of two int8 tensors. This cost is high.
By quantizing, the memory consumption is reduced to 1/4. However, the advantage of inference speed is small without dedicated instructions.
then the optimization by tensorflow lite in YOLOX does not improve the inference time, does it ?
so is there any other optimization method or technique to improve the inference time
thanks in advance
It is difficult to infer yolox fast with tensorflow. Using NNAPI doesn't give much performance. Based on ONNX, we also sell an SDK for high-speed inference of yolox. https://axinc.jp/en/solutions/ailia_sdk.html We have also released an app that can actually infer yolox and evaluate its performance, so please try it here. https://play.google.com/store/apps/details?id=jp.axinc.ailia_ai_showcase&pli=1
Thanks for your response and your helpful information
Hello,
@kyakuno
Please I have another question
I did the optimization by tensor flow lite for a Tensorflow /keras model building and I got good results for the inference time, on the contrary the optimization of the YOLOX Pytorch model by Tensorflow lite
I don't understand exactly why?
Thanks in advance
If you open the model file with netron, you can see the difference in the graph. https://netron.app/
https://github.com/Megvii-BaseDetection/YOLOX/issues/318