Converting quantized onnx with Q/DQ representation to full int8 TFLite model

PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

MIT License

712 stars 73 forks source link

Issue Type

Feature Request

OS

Linux

onnx2tf version number

1.19.11

onnx version number

1.15.0

onnxruntime version number

1.16

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.15.0

Download URL for ONNX

None

Parameter Replacement JSON

None

Description

Thanks for your amazing work

Feature request:

It would be quite useful to be able to convert quantized ONNX models with Q/DQ representation to full int8 TFLite models. This would for example enable the conversion of the Q/DQ ONNX models derived by tools like Brevitas, which introduces various state-of-the-art quantization (homogeneous and mixed-precision) methods, down to TFLite. I think this would be super useful for the community, given TF native quantization support is quite limited.

Do you think this could be doable? Maybe comment on what would be required?

Thanks in advance for your time!

For reference: Brevitas: https://github.com/Xilinx/brevitas

The differences between the ONNX and TFLite specifications are so great that it would be very difficult for me to cover and implement all the patterns by myself. Rather, it is more difficult to cover the comprehensiveness of the model than the difficulty of implementation.

Therefore, I would like you to provide me with all samples of all possible quantized ONNX files that you guys can think of. I can see how it could be useful. However, it is very tiring to examine all the patterns alone.

I can't devote enough work time to do my own quantization using brevitas or something like that to cover the test patterns.

I have spent a year testing and implementing about 1000 different model transformations, and the PyTorch and ONNX models created by researchers and engineers around the world are often very inefficiently structured, and there are so many special cases to consider when transforming models.

By the way, I do not own an FPGA.

PINTO0309 / onnx2tf