Xilinx / brevitas

Brevitas: neural network quantization in PyTorch
https://xilinx.github.io/brevitas/
Other
1.17k stars 195 forks source link

Deprecate QOp Export #834

Open Giuseppe5 opened 8 months ago

Giuseppe5 commented 8 months ago

Although we will keep the interface to have layer-wise export handlers, we will be deprecating support to QOp in favour of QCDQ.

Barrot commented 8 months ago

What is the reason for deprecation?

Giuseppe5 commented 8 months ago

Generally, QCDQ is much easier to use given its flexibility, whilst ONNX and Torch QOp have several constraints about how the layer input, weights, and output should be quantized to work correctly.

Similarly, QCDQ is also much easier to support and work around compared to QOp.

Barrot commented 8 months ago

Thanks @Giuseppe5

prathameshd8 commented 3 months ago

Hi @Giuseppe5,

I have tried both QCDQ and QOp ONNX export. Indeed QCDQ provides a great flexibility in order to export the models to ONNX, whereas for the QOp export one has to consider a lot of constraints.

However, in order to perform full-integer inference by generating C code with the help of frameworks such as TVM, QCDQ adds several Quantize and Dequantize nodes in the ONNX graph, where all the computation essentially happens in floating points.

In this case where you want to perform a full-integer inference, QOp worked quite well, as the integer tensors are passed on to the next layer if you set return_quant_tensor=True, while defining the QuantLayer. Furthermore, one can see that in the C code generated, the computations are performed on integers as expected.

Since, QOp Export will be deprecated, is there any way with QCDQ export, one can perform a full-integer inference?