axinc-ai / ailia-models-tflite

Quantized version of model library
23 stars 1 forks source link

ADD yolox #5

Closed kyakuno closed 2 years ago

kyakuno commented 2 years ago

https://github.com/Megvii-BaseDetection/YOLOX/issues/318

zhaochow commented 2 years ago

@kyakuno Which model(s) (nano, tiny,...) would you like to export? And for the representative dataset, what dataset should I use (a few hundreds images from COCO dataset or something else)?

kyakuno commented 2 years ago

Please export tiny. Please use COCO2017 for representative dataset.

kyakuno commented 2 years ago

@zhaochow Can you write tutorial here how to export tflite from original repository?

zhaochow commented 2 years ago

@zhaochow Can you write tutorial here how to export tflite from original repository?

Yes sure let me write a small summary of it

zhaochow commented 2 years ago

How to export YOLOX to TFLite

Overview: ONNX -> TensorFlow -> TensorFlow Lite

The ONNX models for YOLOX can be found here https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/ONNXRuntime

ONNX -> Tensorflow

Dependencies:

To convert, run onnx-tf convert -i /path/to/input.onnx -o /path/to/output. Output will be in the SavedModel format (directory) e.g. onnx-tf convert -i yolox_tiny.onnx -o yolox_tiny

TensorFlow -> TensorFlow Lite

Dependencies:

A small sized representative dataset is necessary for the full integer quantization. Usually, 100~500 images from the training or validation set is enough.

The code below is an example to convert the SavedModel yolox_tiny to yolox_tiny.tflite. It uses 300 images from the COCO 2017 validation set (300 filepaths were previously randomly selected and written to a csv file).

import os

import cv2
import numpy as np
import pandas as pd

import tensorflow as tf

tf_model_path = 'yolox_tiny'

def representative_data_gen():
    data_dir = 'coco/'
    samples_paths = pd.read_csv(os.path.join(data_dir, 'val2017_300samples.csv'), squeeze=True)
    samples_paths = [os.path.join(data_dir, 'val2017', x) for x in samples_paths]

    # Preprocessing
    # https://github.com/Megvii-BaseDetection/YOLOX/blob/main/demo/ONNXRuntime/onnx_inference.py
    def preproc(img, input_size, swap=(2, 0, 1)):
        if len(img.shape) == 3:
            padded_img = np.ones((input_size[0], input_size[1], 3), dtype=np.uint8) * 114
        else:
            padded_img = np.ones(input_size, dtype=np.uint8) * 114

        r = min(input_size[0] / img.shape[0], input_size[1] / img.shape[1])
        resized_img = cv2.resize(
            img,
            (int(img.shape[1] * r), int(img.shape[0] * r)),
            interpolation=cv2.INTER_LINEAR,
        ).astype(np.uint8)
        padded_img[: int(img.shape[0] * r), : int(img.shape[1] * r)] = resized_img

        padded_img = padded_img.transpose(swap)
        padded_img = np.ascontiguousarray(padded_img, dtype=np.float32)
        return padded_img, r

    samples = np.asarray([
        preproc(cv2.imread(x), (416, 416))[0] for x in samples_paths
    ])

    for input_value in tf.data.Dataset.from_tensor_slices(samples).batch(1):
        yield [input_value]

converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to uint8 (APIs added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

tflite_model_quant = converter.convert()

tflite_path = f'{tf_model_path}_full_integer_quant.tflite'
with open(tflite_path, 'wb') as f:
    f.write(tflite_model_quant)
kyakuno commented 2 years ago

@zhaochow Thanks!

kyakuno commented 2 years ago

9

AouatifZ commented 1 year ago

Hello ,

can you share YOLOX results using TensorFlow Lite ?

kyakuno commented 1 year ago

@AouatifZ You can use official tensorflow runtime using below command.

cd yolox
python3 yolox.py --tflite

https://github.com/axinc-ai/ailia-models-tflite/tree/main/object_detection/yolox

And you can find converted model here. https://netron.app/?url=https://storage.googleapis.com/ailia-models-tflite/yolox/yolox_tiny_full_integer_quant.opt.tflite

AouatifZ commented 1 year ago

@kyakuno Thanks a lot for your help

AouatifZ commented 1 year ago

@kyakuno please can you share your result of inference time using Integer quantization for YOLOX

because in my case the optimization by Only Integer qauntization gives bad results !?

kyakuno commented 1 year ago

In general, quantization does not improve inference speed in tensorflow lite CPU and GPU inference. Quantization works well when using an NPU.

kyakuno commented 1 year ago

The gemm of conv becomes int8, but multiplication is required when converting int32 resulting from gemm to int8. Also, add requires scaling of two int8 tensors. This cost is high.

By quantizing, the memory consumption is reduced to 1/4. However, the advantage of inference speed is small without dedicated instructions.

AouatifZ commented 1 year ago

then the optimization by tensorflow lite in YOLOX does not improve the inference time, does it ?

so is there any other optimization method or technique to improve the inference time

thanks in advance

kyakuno commented 1 year ago

It is difficult to infer yolox fast with tensorflow. Using NNAPI doesn't give much performance. Based on ONNX, we also sell an SDK for high-speed inference of yolox. https://axinc.jp/en/solutions/ailia_sdk.html We have also released an app that can actually infer yolox and evaluate its performance, so please try it here. https://play.google.com/store/apps/details?id=jp.axinc.ailia_ai_showcase&pli=1

AouatifZ commented 1 year ago

Thanks for your response and your helpful information

AouatifZ commented 1 year ago

Hello,

@kyakuno

Please I have another question

I did the optimization by tensor flow lite for a Tensorflow /keras model building and I got good results for the inference time, on the contrary the optimization of the YOLOX Pytorch model by Tensorflow lite

I don't understand exactly why?

Thanks in advance

kyakuno commented 1 year ago

If you open the model file with netron, you can see the difference in the graph. https://netron.app/