Megvii-BaseDetection / YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Apache License 2.0
9.39k stars 2.2k forks source link

Problems evaluating INT8 quantized `TFLite` model #1638

Closed mikel-brostrom closed 1 year ago

mikel-brostrom commented 1 year ago

I have managed to generate : dynamic_range_quant, full_integer_quant and integer_quant versions of the TFLite model using onn2tf. However the postprocessing fails for some reason. The confidences are so low that none of the predictions passes through the filtering. Any idea what could be the problem? The float16 and float32 TFLite models works as usual, achieving the result in the table below. Anybody tried onn2tf and got the models working?

mikel-brostrom commented 1 year ago

Btw, I see quite a lot of people asking about exported model results so I though I could post mine here. I exported them with the decode-in-inference flag in order to minimize model output post-processing. I also had to built a multi-backend class that supported inference of all of the exported models to achieve a meaningful comparison by using exactly the same evaluation pipeline available in this repo for all of them. My results are as follow:

Model size mAPval
0.5:0.95
mAPval
0.5
YOLOX-nano PyTorch 416 0.256 0.411
YOLOX-nano ONNX 416 0.256 0.411
YOLOX-nano TFLite FP32 416 0.256 0.411
mikel-brostrom commented 1 year ago

For anybody interested in why this is the case, we are discussing this here: https://github.com/PINTO0309/onnx2tf/issues/244

mikel-brostrom commented 1 year ago

This seems to be a known critical TF issue. Basically all quantized models don't work when exporting to TFLite by: PyTorch -- (torch.onnx.export) --> ONNX -- (onnx2tf v onnx-tf) --> TFlite. Not sure if this is only the case for model exported by this pipeline or if it is in general. Maybe somebody knows?

PINTO0309 commented 1 year ago

Since Float32 is working fine, it is odd that only the INT8 model would break if the Keras model object used to generate the INT8 model in the backend of the tool is the same. YOLOv8 broke the same way. Thus, I can even presume that it is not a conversion flow issue. PyTorch -> ONNX -> TFLite

mikel-brostrom commented 1 year ago

Thanks for the insights @PINTO0309 :smile:

PINTO0309 commented 1 year ago

For the benefit of other engineers' knowledge, I will also post in this thread the workaround needed to eliminate the accuracy degradation due to quantization. It seems that we need to rethink the activation function, etc. significantly and redefine another YOLOX-alpha like model that is not YOLOX to make it work. Thus, differences in the route of conversion were not related to accuracy degradation. SiLU (Swish) was found to significantly degrade the accuracy of the model during quantization. As an additional research reference, HardSwish also seems to cause significant accuracy degradation during quantization, as does SiLU (Swish).


It is a matter of model structure. The activation function, kernel size and stride for Pooling, and kernel size and stride for Conv should be completely revised. See: https://github.com/PINTO0309/onnx2tf/issues/244#issuecomment-1475128445

PINTO0309 commented 1 year ago

image

mikel-brostrom commented 1 year ago

This got solved here: https://github.com/PINTO0309/onnx2tf/issues/269. Closing this down!