Closed Illia-tsar closed 3 months ago
Not possible at this time.
It is necessary to implement a very special process that directly generates the FlatBuffer format of tflite. Because there is no such conversion scheme in TFLiteConverter. And you are right, because there is no OP in TFLite that behaves the same as ONNX.
I understand that this is somewhat different from your intended conversion flow, but you may want to try the following tools.
@PINTO0309 Thank you for the response! I understand that directly generating the FlatBuffer format for TFLite is necessary and that there's no straightforward conversion scheme using the existing TFLiteConverter. Could you please provide a more detailed explanation or any documentation/references on how to implement this "special process" to directly generate the TFLite model from the ONNX model while preserving the quantization parameters? I'm considering exploring this approach and potentially contributing a Pull Request if it works out. Any guidance or insights would be greatly appreciated!
I was also thinking on an alternative approach to this task:
onnx2tf
Python library. Since the graph is already quantized, the conversion process should focus on transferring the layers and preserving the data types without any intermediate quantization.Questions:
Thanks in advance!
I am too busy to review and reply for a week.
That's fine. Anyway, thanks!
Issue Type
Others
OS
Linux
onnx2tf version number
1.20.0
onnx version number
1.16.1
onnxruntime version number
1.18.1
onnxsim (onnx_simplifier) version number
0.4.33
tensorflow version number
2.17.0
Download URL for ONNX
https://drive.google.com/file/d/1oTmAKn6qh3JTiQ5-ld9d54NAq9qUrsCL/view?usp=sharing
Parameter Replacement JSON
Description
Hello! I need to convert an already quantized ONNX model (using Quantize-Dequantize nodes, QDQ) to a TensorFlow Lite (TFLite) model with 8-bit precision. The key requirement is to preserve the existing quantization parameters from the ONNX model and ensure they are directly translated into the TFLite model without re-quantizing. As a result, I expect to obtain a TFLite model without QDQ nodes, but weights, biases and activations should be adjusted according to scales and zero points in QuantizeLinear/DequantizeLinear from the onnx model. Is it possible to achieve this behaviour with onnx2tf? Am I missing something? Would be grateful to recieve any help. Thanks in advance!