Open Giuseppe5 opened 9 months ago
What is the reason for deprecation?
Generally, QCDQ is much easier to use given its flexibility, whilst ONNX and Torch QOp have several constraints about how the layer input, weights, and output should be quantized to work correctly.
Similarly, QCDQ is also much easier to support and work around compared to QOp.
Thanks @Giuseppe5
Hi @Giuseppe5,
I have tried both QCDQ and QOp ONNX export. Indeed QCDQ provides a great flexibility in order to export the models to ONNX, whereas for the QOp export one has to consider a lot of constraints.
However, in order to perform full-integer inference by generating C code with the help of frameworks such as TVM, QCDQ adds several Quantize and Dequantize nodes in the ONNX graph, where all the computation essentially happens in floating points.
In this case where you want to perform a full-integer inference, QOp worked quite well, as the integer tensors are passed on to the next layer if you set return_quant_tensor=True
, while defining the QuantLayer. Furthermore, one can see that in the C code generated, the computations are performed on integers as expected.
Since, QOp Export will be deprecated, is there any way with QCDQ export, one can perform a full-integer inference?
Although we will keep the interface to have layer-wise export handlers, we will be deprecating support to QOp in favour of QCDQ.