Support standard ONNX quantized operators?

Nullkooland commented 2 years ago

Hi, I've been looking for an end2end quantization deployment solution, so far I've tried the onnxrutime + TVM stack, but onnxruntime only supports naive PTQ methods. Your work looks really promising, however, I wonder can MQBench export the quantized model in the form of standard ONNX quantized ops like QLinearConv, QuantizeLinear, DequantizeLinear, etc?

See: apache/tvm#8838

Tracin commented 2 years ago

We consider deploying quantized model on TVM important, deploying on TVM with QuantDequant node will be supported very soon. Actually we have already done this in PTQ scheme for experiments.

Nullkooland commented 2 years ago

@Tracin I noticed that the v0.0.3 version has ONNX QNN Ops support, however, I cannot export an ONNX QNN model when running the example code in test_quantize_onnxqnn in test_backend.py:

No Op registered for LearnablePerTensorAffine with domain_version of 11

It seems that ONNX does not support the custom fake quantize Op LearnablePerTensorAffine in the quantized model. I also noticed that there's a function deploy_qparams_tvm which uses the ONNXQNNPass, in which fake quantize Ops are replaced with standard ONNX QNN Ops, but this function is not called during convert_deploy. Is there any documentations explaining how to use these utilities to export to a standard ONNX QNN model?

Tracin commented 2 years ago

ONNXQNNPass is registered in scheme ONNX_QNN, you can prepare and covert_delpoy using backend=ONNX_QNN, then it will be called.

ModelTC / MQBench

Support standard ONNX quantized operators? #9