Closed chuong98 closed 6 months ago
After using the function generate_fp8_scale
in the example diffusers
and opset 17 then it works.
For who has the same error:
def generate_fp8_scales(unet):
# temporary solution due to a known bug in torch.onnx._dynamo_export
for _, module in unet.named_modules():
if isinstance(module, (torch.nn.Linear, torch.nn.Conv2d)):
module.input_quantizer._num_bits = 8
module.weight_quantizer._num_bits = 8
module.input_quantizer._amax = (module.input_quantizer._amax * 127) / 448.0
module.weight_quantizer._amax = (module.weight_quantizer._amax * 127) / 448.0
if args.quant_mode == 'fp8':
generate_fp8_scales(model_quant)
torch_to_onnx(model_quant, input)
I am testing exporting model ResNet18 provided by Timm, and I use Docker so the result is reproducable.
mtq.INT8_SMOOTHQUANT_CFG
, I can export the Onnx file from Quantized model successfully.mtq.FP8_DEFAULT_CFG
, exporting Onnx model has error:I used the standard torch.onnx.export as in the example:
Both models were quantized successfully, FP8 has better accuracy than INT8, but it is useless if I can't export the Quantized model to Onnx or TensorRT. I can skip the Onnx model if there is a way to export to TRT directly. Thank you