Preserve weight quantizer while lowering convolutions

fastmachinelearning / qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

https://qonnx.readthedocs.io/

Apache License 2.0

124 stars 39 forks source link

Preserve weight quantizer while lowering convolutions #132

Closed maltanar closed 2 months ago

maltanar commented 2 months ago

The LowerConvsToMatMul transformation previously only worked with Conv nodes that have a static initializer as the weight input. This PR extends the capabilities to deal with the case where the weights are fed by a Quant node, including any transpose/reshape required for the scale factors. See example below (before/after, from a 4-bit MobileNet-v1)