Bloated Full Integer Quant TFLite file

Y-T-G commented 1 week ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.22.3

onnx version number

1.17.0

onnxruntime version number

1.20.0

onnxsim (onnx_simplifier) version number

Not used

tensorflow version number

2.16.2

Download URL for ONNX

https://file.io/OSJbqqc9w78Q

Parameter Replacement JSON

Not used.

Description

Purpose: onnx2tf integration in ultralytics.
What: The fully quantized INT8 files are bloated and over 3 times larger than the original FP16 model. This happens if 4 images are used for calibration. However, if 128 images are used for calibration, it doesn't occur. Some users reported it also occurs with 300 images. yolo export format=tflite model=yolo11s.pt int8=True data=coco8.yaml # bloated
```
ls -sh yolo11s_saved_model/yolo11s_full_integer_quant.tflite
109M yolo11s_saved_model/yolo11s_full_integer_quant.tflite
```
yolo export format=tflite model=yolo11s.pt int8=True data=coco128.yaml # not bloated
```
ls -sh yolo11s_saved_model_coco128/yolo11s_full_integer_quant.tflite
12M yolo11s_saved_model_coco128/yolo11s_full_integer_quant.tflite
```
How: I tried per-channel vs. per-tensor quant_type but the results are the same.
Why: The large file size defeats the purpose of quantization.

Resources: The corresponding code for quantization is here.

    keras_model = onnx2tf.convert(
        input_onnx_file_path=f_onnx,
        output_folder_path=str(f),
        not_use_onnxsim=True,
        verbosity="error",  # note INT8-FP16 activation bug https://github.com/ultralytics/ultralytics/issues/15873
        output_integer_quantized_tflite=self.args.int8,
        quant_type="per-tensor",  # "per-tensor" (faster) or "per-channel" (slower but more accurate)
        custom_input_op_name_np_data_path=np_data,
        disable_group_convolution=True,  # for end-to-end model compatibility
        enable_batchmatmul_unfold=True,  # for end-to-end model compatibility
    )

You can reproduce it by installing ultralytics and using the export command I posted in (2.) above.

github-actions[bot] commented 1 day ago

If there is no activity within the next two days, this issue will be closed automatically.

PINTO0309 commented 1 day ago

pip show onnx2tf
Name: onnx2tf
Version: 1.26.2

import numpy as np

img_datas = []
for _ in range(4):
    img_datas.append(np.ones([1,640,640,3], dtype=np.float32) / 255.0)
calib_datas = np.vstack(img_datas)
print(f'calib_datas.shape: {calib_datas.shape}')
np.save(file='calibdata.npy', arr=calib_datas)
loaded_data = np.load('calibdata.npy')
print(f'loaded_data.shape: {loaded_data.shape}')

python make_calib.py 
calib_datas.shape: (4, 640, 640, 3)
loaded_data.shape: (4, 640, 640, 3)

onnx2tf \
-i yolo11s.onnx \
-cotof \
-oiqt \
-cind "images" "calibdata.npy" "[[[[0.485,0.456,0.406]]]]" "[[[[0.229,0.224,0.225]]]]"

Y-T-G commented 1 day ago

Thanks @PINTO0309

Y-T-G commented 1 day ago

@PINTO0309

I tried the latest version, but the yolo11s_full_integer_quant and yolo11s_integer_quant files are still bloated if -ebu is used.

onnx2tf \
-i "yolo11s.onnx" \
-cotof \
-oiqt \
-ebu -cind "images" "yolo11s_saved_model/tmp_tflite_int8_calibration_images.npy" "[[[[0, 0, 0]]]]" "[[[[255, 255, 255]]]]"

ls -sh saved_model/
total 335M
4.0K assets           11M yolo11s_dynamic_range_quant.tflite  9.8M yolo11s_full_integer_quant_with_int16_act.tflite
4.0K fingerprint.pb   19M yolo11s_float16.tflite              107M yolo11s_integer_quant.tflite
 37M saved_model.pb   37M yolo11s_float32.tflite              9.8M yolo11s_integer_quant_with_int16_act.tflite
4.0K variables       107M yolo11s_full_integer_quant.tflite

pip show onnx2tf

Name: onnx2tf
Version: 1.26.2

PINTO0309 commented 1 day ago

PINTO0309 / onnx2tf