Looking at the forward call of QConv2D, PPQ torch executor seems to be executing with a fake quantization scheme, where the input and weight goes through Q->DQ->Conv rather than Q->INT8_Conv->DQ.
I wonder whether PPQ has an implementation where the Q/DQ nodes are being resolved and real quantized kernels are being invoked. If so, could you please provide a code pointer?
Question
Looking at the forward call of
QConv2D
, PPQ torch executor seems to be executing with a fake quantization scheme, where the input and weight goes through Q->DQ->Conv rather than Q->INT8_Conv->DQ.I wonder whether PPQ has an implementation where the Q/DQ nodes are being resolved and real quantized kernels are being invoked. If so, could you please provide a code pointer?
Thanks in advance.