export to int8 onnx - Githubissues

Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.

https://www.supergradients.com

Apache License 2.0

4.54k stars 496 forks source link

export to int8 onnx #1711

Closed piotr-sikora-v closed 9 months ago

piotr-sikora-v commented 9 months ago

💡 Your Question

There is an option to export to onnx to int8 quantization? I can make int8 using tensorRT, but I need onnx model to use it in web browser.

Versions

No response

BloodAxe commented 9 months ago

Yes you can specify to use default NMS from ONNX opset and export model with INT8 quantization which you can later use with onnxruntime (Probablt with other runtimes as well, but this hasn't been tested). Please check the docs here: https://github.com/Deci-AI/super-gradients/blob/master/documentation/source/models_export.md#advanced-int-8-quantization-options

TLDR:

model.export("output.onnx, 
  engine=ExportTargetBackend.ONNXRUNTIME, 
  quantization_mode=ExportQuantizationMode.INT8,
  ...
)

piotr-sikora-v commented 9 months ago

Yes, I tried this, but model is still huge for browser usage (48MB). There is some other method to reduce size of model?

BTW... when I convert to int8 using TensorRT it reduce size to 14MB

piotr-sikora-v commented 9 months ago

OK I found solution

I use onnx quantize_dynamic. Model reduced size from 48MB to 13MB

There is sometime problem with import it to web browers, so it need to add weight_type and nodes_to_exclude paramteres. Here is code that work for me:

import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = 'input.onnx'
model_quant = 'output.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QUInt8, nodes_to_exclude=['/conv1/Conv'])