Closed piotr-sikora-v closed 9 months ago
Yes you can specify to use default NMS from ONNX opset and export model with INT8 quantization which you can later use with onnxruntime (Probablt with other runtimes as well, but this hasn't been tested). Please check the docs here: https://github.com/Deci-AI/super-gradients/blob/master/documentation/source/models_export.md#advanced-int-8-quantization-options
TLDR:
model.export("output.onnx,
engine=ExportTargetBackend.ONNXRUNTIME,
quantization_mode=ExportQuantizationMode.INT8,
...
)
Yes, I tried this, but model is still huge for browser usage (48MB). There is some other method to reduce size of model?
BTW... when I convert to int8 using TensorRT it reduce size to 14MB
OK I found solution
I use onnx quantize_dynamic. Model reduced size from 48MB to 13MB
There is sometime problem with import it to web browers, so it need to add weight_type and nodes_to_exclude paramteres. Here is code that work for me:
import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType
model_fp32 = 'input.onnx'
model_quant = 'output.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QUInt8, nodes_to_exclude=['/conv1/Conv'])
💡 Your Question
There is an option to export to onnx to int8 quantization? I can make int8 using tensorRT, but I need onnx model to use it in web browser.
Versions
No response