Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.56k stars 496 forks source link

Unable to export finetuned model to ONNX #1597

Closed ani-mal closed 8 months ago

ani-mal commented 11 months ago

💡 Your Question

Description

Unable to export finetuned model to onnx using latest export API. I was able to reproduce the simple example described here on the documentation. However, when I load a checkpoint and try to export the model to onnx, I run into the following error:

code:

from super_gradients.training import models
from super_gradients.common.object_names import Models
from super_gradients.conversion import DetectionOutputFormatMode

checkpoint_path = "C:/temp/yolo_nas_ms_coco_training.pth"
num_classes = 80
checkpoint_num_classes = 80
model = models.get(model_name=Models.YOLO_NAS_S, checkpoint_path=checkpoint_path, num_classes=num_classes, checkpoint_num_classes=num_classes)
export_result = model.export("yolo_nas_s_coco_finetuned.onnx", output_predictions_format=DetectionOutputFormatMode.FLAT_FORMAT, input_image_shape =[640,640], input_image_channels=3)
\super_gradients\training\models\detection_models\yolo_nas\yolo_nas_variants.py", line 112, in get_preprocessing_callback
    raise ModelHasNoPreprocessingParamsException()
super_gradients.module_interfaces.exceptions.ModelHasNoPreprocessingParamsException

I was running into other issues where the input shape was not being automatically inferred from model, hence why I am providing those additional parameters.

Is there anything different that needs to be done to properly initialize the finetuned model prior to exporting it through the new API?

relevant packages on the windows conda env:

# Name                    Version                   Build  Channel
chardet                   5.2.0                    pypi_0    pypi
charset-normalizer        3.3.1                    pypi_0    pypi
cuda-cccl                 12.3.52                       0    nvidia
cuda-cudart               11.7.99                       0    nvidia
cuda-cudart-dev           11.7.99                       0    nvidia
cuda-cupti                11.7.101                      0    nvidia
cuda-libraries            11.7.1                        0    nvidia
cuda-libraries-dev        11.7.1                        0    nvidia
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvrtc-dev            11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-runtime              11.7.1                        0    nvidia
data-gradients            0.2.2                    pypi_0    pypi
numpy                     1.23.0                   pypi_0    pypi
numpy-base                1.26.0           py39h65a83cf_0
onnx                      1.13.0                   pypi_0    pypi
onnx-graphsurgeon         0.3.27                   pypi_0    pypi
onnx-simplifier           0.4.35                   pypi_0    pypi
onnxruntime               1.13.1                   pypi_0    pypi
python                    3.9.18               h1aa4202_0
pytorch                   2.0.1           py3.9_cuda11.7_cudnn8_0    pytorch
pytorch-cuda              11.7                 h16d0643_5    pytorch
pytorch-mutex             1.0                        cuda    pytorch
super-gradients           3.3.1                    pypi_0    pypi
torch                     2.1.0                    pypi_0    pypi
torchaudio                2.0.2                    pypi_0    pypi
torchmetrics              0.8.0                    pypi_0    pypi
torchvision               0.16.0                   pypi_0    pypi

I also inspected with netron the .pth file I loaded to the default example that is pulling the models.get(Models.YOLO_NAS_S, pretrained_weights="coco"). I am not sure what to look for here, but just wanted to inspect if the weight files structure was similar. 2023-10-31_12-58-02

Versions

No response

ani-mal commented 11 months ago

Additional tests

With the same checkpoint I am able to use convert_to_onnx() method to convert to onnx:

model = 
models.get(model_name="yolo_nas_s", checkpoint_path=checkpoint_path, num_classes=num_classes)

actual_output_path  = convert_to_onnx(
                                model=model,
                                out_path='yolo_nas.onnx',
                                input_shape = [3,640,640])

However, I am interested in getting the new API to work to leverage the pre and post processing NMS.

ani-mal commented 11 months ago

Seems like I am having issues with the pre-processing step. Once, I turned that off, I was able to get the nms postprocessing working:

export_result = model.export("yolo_nas_s_coco_training.onnx",  input_image_shape =[640,640], input_image_channels=3, preprocessing=False, nms_threshold=0.5, confidence_threshold=0.5)

image

BloodAxe commented 11 months ago

The original error indicates that checkpoint contains no preprocessing params saved. Can you please share what version of SG are you using? If it's not 3.3.1 I suggest to train a model for 1 epoch with a new version and trying to do model.export() again as it should work.

Can you also do torch.load() on your checkpoint and print keys? I especially interested in seeing whether preprocessing_params is present in that checkpoint.

Indeed, when you don't have preprocessing_params in the checkpoint you cannot use preprocessing (which for yolo nas is just RGB -> BGR and input/255) step baked in the model graph. So exporting with preprocessing=False is a right way to do, just don't forget to change channels order, resize and/or pad images to 640x640 and divide by 255.

ani-mal commented 11 months ago

@BloodAxe thanks for the additional clarification, I just verified the clusters that are training the model and it is indeed using SG 3.2.0. I will updated that to 3.3.1 to verify that this should fix the pre-processing step.

Louis-Dupont commented 9 months ago

@ani-mal does it work well when moving to 3.3.1 ?

ani-mal commented 9 months ago

@Louis-Dupont I was unable to have it working when updating to 3.3.1 so I kept the alternative solution @BloodAxe proposed.

ani-mal commented 8 months ago

@Louis-Dupont closing this issue. The issue was that we were not using the weights saved by SG, but instead, we were creating a .pth file from the model object ourselves, causing the .pth file not to have the metadata SG expects when loaded.