Closed mikel-brostrom closed 1 year ago
I was looking at the table over here. https://github.com/PINTO0309/onnx2tf/issues/269#issuecomment-1488264853
INT8 can only hold values in the range 0-255 (or -128-+128). Therefore, if we merge a flow that wants to express values in the range 0 to 1 with a flow that wants to express values in the range 0 to 416, I feel that almost all elements in the one that wants to express the range 0 to 1 will diverge to approximate 0.
Therefore, we cannot rule out the possibility that this is the problem, but we believe that if there is an earlier part that Concat
and goes to the trouble of merging into 85 channels, then the problem may occur in all of them. So I have a feeling that if each flow with a significantly different value range is processed as separate flows without merging them all the way through, it would work.
All of this is only my imagination, as I have not actually tested it by moving it around at hand.
Output looks like this now;
The position of Dequantize
has obviously changed.
I am also interested in the quantization range for this area.
In/out quantization from top-left to bottom-right of the operations you pointed at:
quantization: -3.1056954860687256 ≤ 0.00014265520439948887 * q ≤ 4.674383163452148
quantization: -3.1056954860687256 ≤ 0.00014265520439948887 * q ≤ 4.674383163452148
quantization: -2.3114538192749023 ≤ 0.00010453650611452758 * q ≤ 3.4253478050231934
quantization: 0.00014265520439948887 * q
quantization: -2.2470905780792236 ≤ 0.00011867172725033015 * q ≤ 3.888516426086426
quantization: 0.00014265520439948887 * q
quantization: 0.00014265520439948887 * q
quantization: -3.1056954860687256 ≤ 0.00014265520439948887 * q ≤ 4.674383163452148
It looks fine to me.
Going for a full COCO eval now :rocket:
Great! 🚀🚀
Great that we get this into YOLOv8 as well @motokimura! Thank you both for this joint effort :heart:
Model | size | mAPval 0.5:0.95 |
mAPval 0.5 |
size | calibration images |
---|---|---|---|---|---|
YOLOX-TI-nano TFLite FP32 | 416 | 0.261 | 0.418 | 8.7M | N/A |
YOLOX-TI-nano TFLite INT8 | 416 | 0.242 | 0.408 | 2.4M | 200 |
YOLOX-TI-nano TFLite INT8 | 416 | 0.243 | 0.408 | 2.4M | 800 |
congratulations! :+1:
I will close this issue once the original problem has been solved and the INT8 quantization problem seems to have been resolved.
Sorry for bothering you again but one thing is still unclear to me. Even when bringing the xy
, wh
, probs
values to [0, 1] and then quantizing the model with a single output:
results are much worse than using separate xy
, wh
, probs
outputs like this:
From our lengthy discussion I recall this:
Therefore, if we merge a flow that wants to express values in the range 0 to 1 with a flow that wants to express values in the range 0 to 416, I feel that almost all elements in the one that wants to express the range 0 to 1 will diverge to approximate 0.
and this:
In TFLite quantization, activation is quantized in per-tensor manner. That is, the OR distribution of xywh and scores, (min, max) = (0.0, 416.0), is mapped to integer values of (min, max) = (0, 255) after the Concat. As a result, even if the score is 1.0, after quantization it is mapped to: int(1.0 / 416 * 255) = int(0.61) = 0, resulting in all scores being zero!
Which makes total sense to me. Specially given the disparity in the different ranges within the same output. But why are the quantization results much worse for the model with a single output given that the values have the same range for all values? Does this make sense to you?
Model | size | mAPval 0.5:0.95 |
mAPval 0.5 |
size | calibration images |
---|---|---|---|---|---|
YOLOX-TI-nano SINGLE OUTPUT | 416 | 0.064 | 0.240 | 2.4M | 8 |
YOLOX-TI-nano TFLite XY, WH, PROBS OUTPUT | 416 | 0.242 | 0.408 | 2.4M | 8 |
There is no part of the model left to explain in more detail than Motoki's explanation, but again, take a good look at the quantization parameters around the final output of the model. I think you can see why Concat
is a bad idea.
All 1.7974882125854492 * (q + 128)
The values diverge when inverse quantization (Dequantize
) is performed.
onnx2tf -i yolox_nano_no_scatternd.onnx -oiqt -qt per-tensor
Perhaps that is why TI used ScatterND
.
In your inference code posted in this comment,
x[0:4] = x[0:4] * 416 # notice xywh in the model is divided by 416
The first dim of x
should be batch dim, I think.
However, this should decrease the accuracy of float models as well..
Yup, sorry @motokimura, that's a typo. It is
outputs[:, :, 0:4] = outputs[:, :, 0:4] * 416
I have no idea what is happening in Concat..
As I posted, you may find something if you compare the distribution of outputs from float/int8 models.
@mikel-brostrom Can you check what happens if you apply clipping to xy and wh before Concat?
if self.int8:
xy = torch.div(xy, 416)
wh = torch.div(wh, 416)
# clipping
xy = torch.clamp(xy, min=0, max=1)
wh = torch.clamp(wh, min=0, max=1)
outputs = torch.cat([xy, wh, outputs[..., 4:]], dim=-1)
Assumption: xy and/or wh may have a few outliers which make quantization range much wider than we expected. Especially wh can have such outliers because Exp is used as activation function.
Good point @motokimura. Reporting back on Monday 😊
Interesting. It actually made it worse...
Model | size | mAPval 0.5:0.95 |
mAPval 0.5 |
size | calibration images |
---|---|---|---|---|---|
YOLOX-TI-nano TFLite XY, WH, PROBS OUTPUT | 416 | 0.242 | 0.408 | 2.4M | 8 |
YOLOX-TI-nano SINGLE OUTPUT | 416 | 0.062 | 0.229 | 2.4M | 8 |
YOLOX-TI-nano SINGLE OUTPUT (Clamped xywh) | 416 | 0.028 | 0.103 | 2.4M | 8 |
At this point I have no idea more than this comment about the quantization of Concat and what kind of quantization errors are happening inside actually.. This Concat is not necessary by nature and has no benefit for the model quantization, so I think we don't need go any deeper with this.
All I can say at this point is that tensors with very different value ranges should not be concatenated, especially in post-processing of the model.
Thank you for doing the experiment and sharing your results!
This Concat is not necessary by nature and has no benefit for the model quantization, so I think we don't need go any deeper with this.
Agree, let's close this. Enough experimentation on this topic :smile: . Again, thank you both @motokimura, @PINTO0309 for time and guidance during this quantization journey. I learnt a lot, hopefully you got something out of the experiment results posted here as well :pray:
Issue Type
Others
onnx2tf version number
1.8.1
onnx version number
1.13.1
tensorflow version number
2.12.0
Download URL for ONNX
yolox_nano_ti_lite_26p1_41p8.zip
Parameter Replacement JSON
Description
Hi @PINTO0309. After our lengthy discussion regarding INT8 YOLOX export I decided to try out Ti's version of these models (https://github.com/TexasInstruments/edgeai-yolox/tree/main/pretrained_models). It looked to me that you manged to INT8-export those so maybe you could provide some hints :smile:. I just downloaded the ONNX version of YOLOX-nano. For this model, the following fails:
The error I get: