Closed ymgwjk closed 5 months ago
We will need to see the torch code to be sure but I guess the resultant QDQ placement (aka selection of operations for quantization) is not supported by TensorRT, may be a quantized conv weights got shared. Please share the torch and/or ONNX model with us to have a look. Also you can export the torch model to ONNX first and use modelopt.onnx.quantization
tool to do PTQ before deploying with TensorRT, though smoothquant is not supported in that workflow. cc @realAsma
We will need to see the torch code to be sure but I guess the resultant QDQ placement (aka selection of operations for quantization) is not supported by TensorRT, may be a quantized conv weights got shared. Please share the torch and/or ONNX model with us to have a look. Also you can export the torch model to ONNX first and use
modelopt.onnx.quantization
tool to do PTQ before deploying with TensorRT, though smoothquant is not supported in that workflow. cc @realAsma
I agree with @riyadshairi979. TensorRT does not support INT8_SMOOTHQUANT_CFG
for models with Conv layers such as CNNs. INT8_SMOOTHQUANT_CFG
works well for LLMs with TensorRT - LLM.
I encountered an error when converting my int8 onnx model to tensorrt engine. The quantization and exporting code is:
The TensorRT version is 8.6.1.6, and when I try to convert the onnx model to tensorrt engine with this command
, I encountered the following error:
It seems like the mtq.quantize inserted some quantized layers but the tensorrt can't find the implementations for them. So what is the proper way to quantize model and convert it to a TensorRT engine?