PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
MIT License
708 stars 73 forks source link

Bloated Full Integer Quant TFLite file #724

Closed Y-T-G closed 1 day ago

Y-T-G commented 1 week ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.22.3

onnx version number

1.17.0

onnxruntime version number

1.20.0

onnxsim (onnx_simplifier) version number

Not used

tensorflow version number

2.16.2

Download URL for ONNX

https://file.io/OSJbqqc9w78Q

Parameter Replacement JSON

Not used.

Description

  1. Purpose: onnx2tf integration in ultralytics.
  2. What: The fully quantized INT8 files are bloated and over 3 times larger than the original FP16 model. This happens if 4 images are used for calibration. However, if 128 images are used for calibration, it doesn't occur. Some users reported it also occurs with 300 images. yolo export format=tflite model=yolo11s.pt int8=True data=coco8.yaml # bloated
    ls -sh yolo11s_saved_model/yolo11s_full_integer_quant.tflite
    109M yolo11s_saved_model/yolo11s_full_integer_quant.tflite

    yolo export format=tflite model=yolo11s.pt int8=True data=coco128.yaml # not bloated

    ls -sh yolo11s_saved_model_coco128/yolo11s_full_integer_quant.tflite
    12M yolo11s_saved_model_coco128/yolo11s_full_integer_quant.tflite
  3. How: I tried per-channel vs. per-tensor quant_type but the results are the same.
  4. Why: The large file size defeats the purpose of quantization.
  5. Resources: The corresponding code for quantization is here.
        keras_model = onnx2tf.convert(
            input_onnx_file_path=f_onnx,
            output_folder_path=str(f),
            not_use_onnxsim=True,
            verbosity="error",  # note INT8-FP16 activation bug https://github.com/ultralytics/ultralytics/issues/15873
            output_integer_quantized_tflite=self.args.int8,
            quant_type="per-tensor",  # "per-tensor" (faster) or "per-channel" (slower but more accurate)
            custom_input_op_name_np_data_path=np_data,
            disable_group_convolution=True,  # for end-to-end model compatibility
            enable_batchmatmul_unfold=True,  # for end-to-end model compatibility
        )

    You can reproduce it by installing ultralytics and using the export command I posted in (2.) above.

github-actions[bot] commented 1 day ago

If there is no activity within the next two days, this issue will be closed automatically.

PINTO0309 commented 1 day ago
pip show onnx2tf
Name: onnx2tf
Version: 1.26.2
import numpy as np

img_datas = []
for _ in range(4):
    img_datas.append(np.ones([1,640,640,3], dtype=np.float32) / 255.0)
calib_datas = np.vstack(img_datas)
print(f'calib_datas.shape: {calib_datas.shape}')
np.save(file='calibdata.npy', arr=calib_datas)
loaded_data = np.load('calibdata.npy')
print(f'loaded_data.shape: {loaded_data.shape}')
python make_calib.py 
calib_datas.shape: (4, 640, 640, 3)
loaded_data.shape: (4, 640, 640, 3)
onnx2tf \
-i yolo11s.onnx \
-cotof \
-oiqt \
-cind "images" "calibdata.npy" "[[[[0.485,0.456,0.406]]]]" "[[[[0.229,0.224,0.225]]]]"

image

Y-T-G commented 1 day ago

Thanks @PINTO0309

Y-T-G commented 1 day ago

@PINTO0309

I tried the latest version, but the yolo11s_full_integer_quant and yolo11s_integer_quant files are still bloated if -ebu is used.

onnx2tf \
-i "yolo11s.onnx" \
-cotof \
-oiqt \
-ebu -cind "images" "yolo11s_saved_model/tmp_tflite_int8_calibration_images.npy" "[[[[0, 0, 0]]]]" "[[[[255, 255, 255]]]]"
ls -sh saved_model/
total 335M
4.0K assets           11M yolo11s_dynamic_range_quant.tflite  9.8M yolo11s_full_integer_quant_with_int16_act.tflite
4.0K fingerprint.pb   19M yolo11s_float16.tflite              107M yolo11s_integer_quant.tflite
 37M saved_model.pb   37M yolo11s_float32.tflite              9.8M yolo11s_integer_quant_with_int16_act.tflite
4.0K variables       107M yolo11s_full_integer_quant.tflite
pip show onnx2tf

Name: onnx2tf
Version: 1.26.2
PINTO0309 commented 1 day ago

image