YOLO-NAS INT8 quantization

Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.

https://www.supergradients.com

Apache License 2.0

4.53k stars 493 forks source link

YOLO-NAS INT8 quantization #1427

Closed mattcrossmarc closed 7 months ago

mattcrossmarc commented 1 year ago

💡 Your Question

The YOLO-NAS readme at YOLONAS.md mentions an INT8 quantized version. I've followed the instructions for YOLO-NAS PTQ and QAT training, but the resulting model still uses float32 inputs and outputs.

How can I produce a model that uses int8 operations and inputs and outputs?

Versions

No response

bit-scientist commented 1 year ago

Hello, @mattcrossmarc . Could you explain what steps you took to come to your current situation? I will try to redo your steps to fully be able to assist you.

mattcrossmarc commented 1 year ago

Hi @bit-scientist, thanks for your help. I followed the steps for the page linked below, but for a different dataset also downloaded from Roboflow:

https://github.com/Deci-AI/super-gradients/blob/master/documentation/source/qat_ptq_yolo_nas.md

More details:

super-gradients v3.2.0
python -m train_from_recipe as described in the above doc (after cloning the super-gradients repo)

bit-scientist commented 1 year ago

Hi, @mattcrossmarc, I was able to reproduce the issue and can confirm that the resulted qat and pqt models are indeed in f32 format. The issue will be looked into in the future, but unfortunately, I can't say the estimate. Thanks for mentioning.

mattcrossmarc commented 1 year ago

Thanks @bit-scientist for taking the time to confirm. Related to this, I saw the release of the new export API in SG 3.2, which gets closer to full integer quantization:

https://github.com/Deci-AI/super-gradients/blob/838398d1b4b78a86137b3869ebaf88baa701e7e3/documentation/source/models_export.md

Based on my testing, the exporter converts the inputs to uint8 (uint8[1,3,640,640]) but the flatted output is still float32 (float32[num_predictions,7]).

Full integer quantization would be valuable for using yolo-nas for real-time edge inference scenarios where devices have low power requirements and specialized hardware for inference (ex: NPUs)

BloodAxe commented 7 months ago

You may want to check these notebooks showing how to do QAT and also use TRT for inference of quantized model

https://github.com/Deci-AI/super-gradients/blob/master/notebooks/YoloNAS_Inference_using_TensorRT.ipynb

https://github.com/Deci-AI/super-gradients/blob/master/notebooks/yolo_nas_custom_dataset_fine_tuning_with_qat.ipynb