RTMDet int8 quantization

ramonhollands commented 3 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.22.0

onnx version number

1.15.0

onnxruntime version number

1.17.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.16.1

Download URL for ONNX

end2end_onnxsim.onnx.zip

Parameter Replacement JSON

None

Description

I want to convert the onnx file (RTMdet) to tflite. I already removed the post processing part and made sure the last concat is in equal space (normalized bounding box coordinates by dividing xywh by width and height of output)

The quantified tflite file still is different from the tflite float32 output. So I want to debug the complete graph.

My idea is to do underneath three steps for multiple spots in the graph:

Use -onimc to truncate the graph till a certain point.
Convert onnx to tflite32 and tflite8
Compare outputs

However, on converting (truncating) I found the following issue: onnx2tf -i end2end.onnx -osd -onimc /backbone/stage1/stage1.1/Concat

Model optimizing complete!

Automatic generation of each OP name started ======================================== Automatic generation of each OP name complete! INFO: Finish!

Model loaded ========================================================================

Model conversion started ============================================================ Traceback (most recent call last): File "/usr/local/bin/onnx2tf", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/onnx2tf/onnx2tf.py", line 2380, in main model = convert( File "/usr/local/lib/python3.10/dist-packages/onnx2tf/onnx2tf.py", line 924, in convert batch_size = inputs[0].shape[0] IndexError: list index out of range

My questions:

Is this the right approach to debug?
Do you know why the error is raised?
Do you have a hint what ops are frustrating quantization?

Thanks so much for your help!

PINTO0309 commented 3 months ago

The problem of model accuracy degradation after INT8 quantization and the problem of conversion to the Float32 model need to be separated and debugged as separate problems.

The issue of accuracy degradation after INT8 quantization is not related to onnx2tf.

You can tell if the precision of the Float32 tflite is accurate by simply converting it using the -cotof option.

onnx2tf -i end2end_onnxsim.onnx -cotof

INFO: onnx_output_name: /bbox_head/reg_convs.2.1/conv/Conv_output_0 tf_output_name: tf.math.add_93/Add:0 shape: (1, 96, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /bbox_head/cls_convs.2.1/activate/Sigmoid_output_0 tf_output_name: tf.math.sigmoid_74/Sigmoid:0 shape: (1, 96, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /bbox_head/reg_convs.2.1/activate/Sigmoid_output_0 tf_output_name: tf.math.sigmoid_75/Sigmoid:0 shape: (1, 96, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /bbox_head/cls_convs.2.1/activate/Mul_output_0 tf_output_name: tf.math.multiply_329/Mul:0 shape: (1, 96, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /bbox_head/reg_convs.2.1/activate/Mul_output_0 tf_output_name: tf.math.multiply_333/Mul:0 shape: (1, 96, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /bbox_head/rtm_cls.2/Conv_output_0 tf_output_name: tf.math.add_94/Add:0 shape: (1, 1, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /bbox_head/rtm_reg.2/Conv_output_0 tf_output_name: tf.math.add_95/Add:0 shape: (1, 4, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /bbox_head/Mul_2_output_0 tf_output_name: tf.math.multiply_335/Mul:0 shape: (1, 4, 10, 10) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /Transpose_2_output_0 tf_output_name: tf.math.add_94/Add:0 shape: (1, 10, 10, 1) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /Reshape_8_output_0 tf_output_name: tf.reshape_14/Reshape:0 shape: (1, 100, 1) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /Transpose_5_output_0 tf_output_name: tf.math.multiply_335/Mul:0 shape: (1, 10, 10, 4) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /Reshape_11_output_0 tf_output_name: tf.reshape_17/Reshape:0 shape: (1, 100, 4) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /Concat_3_output_0 tf_output_name: tf.concat_13/concat:0 shape: (1, 2100, 1) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: labels tf_output_name: tf.math.sigmoid_76/Sigmoid:0 shape: (1, 2100, 1) dtype: float32 validate_result:  Matches 
INFO: onnx_output_name: /Concat_4_output_0 tf_output_name: tf.concat_14/concat:0 shape: (1, 2100, 4) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.00048828125
INFO: onnx_output_name: /Gather_1_output_0 tf_output_name: tf.compat.v1.gather/GatherV2:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.0003814697265625
INFO: onnx_output_name: /Gather_3_output_0 tf_output_name: tf.compat.v1.gather_1/GatherV2:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.000335693359375
INFO: onnx_output_name: /Gather_4_output_0 tf_output_name: tf.compat.v1.gather_2/GatherV2:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.00045013427734375
INFO: onnx_output_name: /Gather_5_output_0 tf_output_name: tf.compat.v1.gather_3/GatherV2:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.00048828125
INFO: onnx_output_name: /Sub_output_0 tf_output_name: tf.math.subtract/Sub:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.0003814697265625
INFO: onnx_output_name: /Sub_1_output_0 tf_output_name: tf.math.subtract_1/Sub:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.000335693359375
INFO: onnx_output_name: /Add_output_0 tf_output_name: tf.math.add_96/Add:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.000457763671875
INFO: onnx_output_name: /Add_1_output_0 tf_output_name: tf.math.add_97/Add:0 shape: (1, 2100) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.00048828125
INFO: onnx_output_name: /Unsqueeze_6_output_0 tf_output_name: tf.reshape_18/Reshape:0 shape: (1, 2100, 1) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.0003814697265625
INFO: onnx_output_name: /Unsqueeze_7_output_0 tf_output_name: tf.reshape_19/Reshape:0 shape: (1, 2100, 1) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.000335693359375
INFO: onnx_output_name: /Unsqueeze_8_output_0 tf_output_name: tf.reshape_20/Reshape:0 shape: (1, 2100, 1) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.000457763671875
INFO: onnx_output_name: /Unsqueeze_9_output_0 tf_output_name: tf.reshape_21/Reshape:0 shape: (1, 2100, 1) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.00048828125
INFO: onnx_output_name: dets tf_output_name: tf.concat_17/concat:0 shape: (1, 2100, 4) dtype: float32 validate_result:  Unmatched  max_abs_error: 0.00048828125

As a result, Float32's tflite model only incurred an error of about 1e-4. This is usually a tolerable error due to differences in the internal implementation of the runtime. The transformation operation of the model by onnx2tf itself shows no problem.

Problem of accuracy degradation during INT8 quantization. Don't use Swish a lot. The README explains the issue of accuracy degradation in considerable detail. Please read the README seriously first. https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#7-if-the-accuracy-of-the-int8-quantized-model-degrades-significantly

ramonhollands commented 3 months ago

Hi @PINTO0309,

Thanks for your quick response. Reading it again, I could have identified the SiLu/Swish activation, sorry for that.

So I should retrain the original pytorch model with a different activation function (eg using relu), correct?

PINTO0309 commented 3 months ago

So I should retrain the original pytorch model with a different activation function (eg using relu), correct?

If I understand correctly, you are correct.

ramonhollands commented 3 months ago

Hi @PINTO0309,

One thing:

Edgeyolo has the same kind of patterns in the onnx graph and is exporting fine without accuracy degradation during INT8 quantization.

Screenshot from 2024-05-31 16-25-17

So it Swish is not necessarily a problem, it might be some combination of things.

PINTO0309 commented 3 months ago

Then it must be a post-processing Concat or something.

https://github.com/PINTO0309/PINTO_model_zoo/tree/main/426_YOLOX-Body-Head-Hand

ramonhollands commented 3 months ago

Thanks for your reply again!

Why are you sharing this repo and screenshot? It's a different model, right?

PINTO0309 commented 3 months ago

Have you not read the issues linked in the README? It's a pain to explain because the same issue is posted over and over again.

ramonhollands commented 3 months ago

Yes I did but I still don't understand your reply. Sorry. But willing to add my insights and more detailed explanation if I understand the issue.

PINTO0309 commented 3 months ago

First, click on this label. There should be almost every exchange on basic quantization like the one you've been asking about.

The query is as follows.

https://github.com/PINTO0309/onnx2tf/issues?q=label%3AQuantization+is%3Aclosed

Among the most important issues are the following. The issue you fixed the broken link for me the other day was an issue that contained quite important information, but for some reason I had to delete the issue. Sorry about that.

[YOLOX-TI] ERROR: onnx_op_name: /head/ScatterND #269

Of the structure of the model body, the destruction of the quantization range by Concat is the majority of the problem during INT8 quantization. This is a problem that can occur anywhere, whether it is pre-processing, post-processing, or body processing. Do not combine elements with different meanings into a single tensor. It does not matter what the architecture of the model is. All models can have the same problem, YOLO or not.

Since there is no single cause for the significant loss of accuracy in quantized models, I will not attempt to explain all of them here.

ramonhollands commented 3 months ago

@PINTO0309 Thanks! This helps. I found out I took the wrong node names in 'onnx2tf -i end2end.onnx -osd -onimc /backbone/stage1/stage1.1/Concat' I have to take the outputs of that node instead. (https://github.com/PINTO0309/onnx2tf/issues/520)

I think it's a concat problem somewhere indeed. I already made the last concat to be the same scale/meaning, like in the ticket you mention, edgeyolo and yolov8, but Ill start inspecting the other concats in the graph.

PINTO0309 / onnx2tf