PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
MIT License
690 stars 73 forks source link

[YOLOX-TI] ERROR: onnx_op_name: /head/ScatterND #269

Closed mikel-brostrom closed 1 year ago

mikel-brostrom commented 1 year ago

Issue Type

Others

onnx2tf version number

1.8.1

onnx version number

1.13.1

tensorflow version number

2.12.0

Download URL for ONNX

yolox_nano_ti_lite_26p1_41p8.zip

Parameter Replacement JSON

{
    "format_version": 1,
    "operations": [
        {
            "op_name": "/head/ScatterND",
            "param_target": "inputs",
            "param_name": "/head/Concat_1_output_0",
            "values": [1,85,52,52]
        }
    ]
}

Description

Hi @PINTO0309. After our lengthy discussion regarding INT8 YOLOX export I decided to try out Ti's version of these models (https://github.com/TexasInstruments/edgeai-yolox/tree/main/pretrained_models). It looked to me that you manged to INT8-export those so maybe you could provide some hints :smile:. I just downloaded the ONNX version of YOLOX-nano. For this model, the following fails:

onnx2tf -i ./yolox_nano.onnx -o yolox_nano_saved_model

The error I get:

ERROR: input_onnx_file_path: /datadrive/mikel/edgeai-yolox/yolox_nano.onnx
ERROR: onnx_op_name: /head/ScatterND
ERROR: Read this and deal with it. https://github.com/PINTO0309/onnx2tf#parameter-replacement
ERROR: Alternatively, if the input OP has a dynamic dimension, use the -b or -ois option to rewrite it to a static shape and try again.
ERROR: If the input OP of ONNX before conversion is NHWC or an irregular channel arrangement other than NCHW, use the -kt or -kat option.
ERROR: Also, for models that include NonMaxSuppression in the post-processing, try the -onwdt option.
  1. Research
  2. Export error
  3. I tried to overwrite the values of the parameter by the replacement json provided above with no luck
  4. Project need
  5. Operation that fails can be found in the image below: Screenshot from 2023-03-24 10-37-02
PINTO0309 commented 1 year ago

I was looking at the table over here. https://github.com/PINTO0309/onnx2tf/issues/269#issuecomment-1488264853 image

INT8 can only hold values in the range 0-255 (or -128-+128). Therefore, if we merge a flow that wants to express values in the range 0 to 1 with a flow that wants to express values in the range 0 to 416, I feel that almost all elements in the one that wants to express the range 0 to 1 will diverge to approximate 0.

Therefore, we cannot rule out the possibility that this is the problem, but we believe that if there is an earlier part that Concat and goes to the trouble of merging into 85 channels, then the problem may occur in all of them. So I have a feeling that if each flow with a significantly different value range is processed as separate flows without merging them all the way through, it would work.

All of this is only my imagination, as I have not actually tested it by moving it around at hand. image

mikel-brostrom commented 1 year ago

Output looks like this now;

Screenshot from 2023-03-29 13-42-03

PINTO0309 commented 1 year ago

The position of Dequantize has obviously changed.

I am also interested in the quantization range for this area. image

mikel-brostrom commented 1 year ago

In/out quantization from top-left to bottom-right of the operations you pointed at:

quantization: -3.1056954860687256 ≤ 0.00014265520439948887 * q ≤ 4.674383163452148
quantization: -3.1056954860687256 ≤ 0.00014265520439948887 * q ≤ 4.674383163452148

quantization: -2.3114538192749023 ≤ 0.00010453650611452758 * q ≤ 3.4253478050231934
quantization: 0.00014265520439948887 * q

quantization: -2.2470905780792236 ≤ 0.00011867172725033015 * q ≤ 3.888516426086426
quantization: 0.00014265520439948887 * q

quantization: 0.00014265520439948887 * q
quantization: -3.1056954860687256 ≤ 0.00014265520439948887 * q ≤ 4.674383163452148
PINTO0309 commented 1 year ago

It looks fine to me.

mikel-brostrom commented 1 year ago

Going for a full COCO eval now :rocket:

motokimura commented 1 year ago

Great! 🚀🚀

mikel-brostrom commented 1 year ago

Great that we get this into YOLOv8 as well @motokimura! Thank you both for this joint effort :heart:

Model size mAPval
0.5:0.95
mAPval
0.5
size calibration images
YOLOX-TI-nano TFLite FP32 416 0.261 0.418 8.7M N/A
YOLOX-TI-nano TFLite INT8 416 0.242 0.408 2.4M 200
YOLOX-TI-nano TFLite INT8 416 0.243 0.408 2.4M 800
PINTO0309 commented 1 year ago

congratulations! :+1:

PINTO0309 commented 1 year ago

I will close this issue once the original problem has been solved and the INT8 quantization problem seems to have been resolved.

mikel-brostrom commented 1 year ago

Sorry for bothering you again but one thing is still unclear to me. Even when bringing the xy, wh, probs values to [0, 1] and then quantizing the model with a single output:

Screenshot from 2023-03-31 10-42-39

results are much worse than using separate xy, wh, probs outputs like this:

Screenshot from 2023-03-31 10-45-42

From our lengthy discussion I recall this:

Therefore, if we merge a flow that wants to express values in the range 0 to 1 with a flow that wants to express values in the range 0 to 416, I feel that almost all elements in the one that wants to express the range 0 to 1 will diverge to approximate 0.

and this:

In TFLite quantization, activation is quantized in per-tensor manner. That is, the OR distribution of xywh and scores, (min, max) = (0.0, 416.0), is mapped to integer values of (min, max) = (0, 255) after the Concat. As a result, even if the score is 1.0, after quantization it is mapped to: int(1.0 / 416 * 255) = int(0.61) = 0, resulting in all scores being zero!

Which makes total sense to me. Specially given the disparity in the different ranges within the same output. But why are the quantization results much worse for the model with a single output given that the values have the same range for all values? Does this make sense to you?

Model size mAPval
0.5:0.95
mAPval
0.5
size calibration images
YOLOX-TI-nano SINGLE OUTPUT 416 0.064 0.240 2.4M 8
YOLOX-TI-nano TFLite XY, WH, PROBS OUTPUT 416 0.242 0.408 2.4M 8
PINTO0309 commented 1 year ago

There is no part of the model left to explain in more detail than Motoki's explanation, but again, take a good look at the quantization parameters around the final output of the model. I think you can see why Concat is a bad idea.

All 1.7974882125854492 * (q + 128)

The values diverge when inverse quantization (Dequantize) is performed.

onnx2tf -i yolox_nano_no_scatternd.onnx -oiqt -qt per-tensor

image image

Perhaps that is why TI used ScatterND.

motokimura commented 1 year ago

In your inference code posted in this comment,

x[0:4] = x[0:4] * 416 # notice xywh in the model is divided by 416

The first dim of x should be batch dim, I think.

However, this should decrease the accuracy of float models as well..

mikel-brostrom commented 1 year ago

Yup, sorry @motokimura, that's a typo. It is

outputs[:, :, 0:4] = outputs[:, :, 0:4] * 416

motokimura commented 1 year ago

I have no idea what is happening in Concat..

As I posted, you may find something if you compare the distribution of outputs from float/int8 models.

motokimura commented 1 year ago

@mikel-brostrom Can you check what happens if you apply clipping to xy and wh before Concat?

if self.int8:
    xy = torch.div(xy, 416)
    wh = torch.div(wh, 416)
    # clipping
    xy = torch.clamp(xy, min=0, max=1)
    wh = torch.clamp(wh, min=0, max=1)

outputs = torch.cat([xy, wh, outputs[..., 4:]], dim=-1)

Assumption: xy and/or wh may have a few outliers which make quantization range much wider than we expected. Especially wh can have such outliers because Exp is used as activation function.

mikel-brostrom commented 1 year ago

Good point @motokimura. Reporting back on Monday 😊

mikel-brostrom commented 1 year ago

Interesting. It actually made it worse...

Model size mAPval
0.5:0.95
mAPval
0.5
size calibration images
YOLOX-TI-nano TFLite XY, WH, PROBS OUTPUT 416 0.242 0.408 2.4M 8
YOLOX-TI-nano SINGLE OUTPUT 416 0.062 0.229 2.4M 8
YOLOX-TI-nano SINGLE OUTPUT (Clamped xywh) 416 0.028 0.103 2.4M 8
motokimura commented 1 year ago

At this point I have no idea more than this comment about the quantization of Concat and what kind of quantization errors are happening inside actually.. This Concat is not necessary by nature and has no benefit for the model quantization, so I think we don't need go any deeper with this.

All I can say at this point is that tensors with very different value ranges should not be concatenated, especially in post-processing of the model.

Thank you for doing the experiment and sharing your results!

mikel-brostrom commented 1 year ago

This Concat is not necessary by nature and has no benefit for the model quantization, so I think we don't need go any deeper with this.

Agree, let's close this. Enough experimentation on this topic :smile: . Again, thank you both @motokimura, @PINTO0309 for time and guidance during this quantization journey. I learnt a lot, hopefully you got something out of the experiment results posted here as well :pray: