OpenPPL / ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Apache License 2.0
1.56k stars 236 forks source link

How does PPQ perform real quantization and achieve speed up? #570

Open YixuanSeanZhou opened 3 months ago

YixuanSeanZhou commented 3 months ago

Question

Looking at the forward call of QConv2D, PPQ torch executor seems to be executing with a fake quantization scheme, where the input and weight goes through Q->DQ->Conv rather than Q->INT8_Conv->DQ.

I wonder whether PPQ has an implementation where the Q/DQ nodes are being resolved and real quantized kernels are being invoked. If so, could you please provide a code pointer?

Thanks in advance.