Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.59k stars 509 forks source link

Error with Model Loaded from Checkpoint #1899

Open ichu-apl opened 8 months ago

ichu-apl commented 8 months ago

🐛 Describe the bug

I tried to load from checkpoint using the file initially downloaded by the models.get() call. It looks like there's a leftover TODO in the repo code that's a reminder to remove set_detection_processing_params() requirement for models loaded from checkpoint.

Minimal Example

import cv2

# undo super_gradient's stdout redirect
import sys
stdout = sys.stdout
from super_gradients.training import models
from super_gradients.common.object_names import Models
sys.stdout = stdout

# load model
net = models.get(
    model_name=Models.YOLO_NAS_M,
    num_classes=80,
    checkpoint_path="yolo_nas_m_coco.pth")

# predict on image
img = cv2.imread("test.png")
detections = net.predict(img)

Error Traceback:

Traceback (most recent call last):
  File "C:\Users\chui1\Desktop\PythonScripts\SpacialTransformer\error.py", line 18, in <module>
    detections = net.predict(img)
  File "C:\Users\chui1\Desktop\PythonEnvs\spatial\lib\site-packages\super_gradients\training\models\detection_models\customizable_detector.py", line 291, in predict
    pipeline = self._get_pipeline(
  File "C:\Users\chui1\Desktop\PythonEnvs\spatial\lib\site-packages\super_gradients\training\models\detection_models\customizable_detector.py", line 227, in _get_pipeline
    raise RuntimeError(
RuntimeError: You must set the dataset processing parameters before calling predict.
Please call `model.set_dataset_processing_params(...)` first.

Versions

Collecting environment information... PyTorch version: 2.2.1+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise GCC version: (Rev10, Built by MSYS2 project) 12.2.0 Clang version: Could not collect CMake version: version 3.24.0-rc3 Libc version: N/A

Python version: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: False CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA GeForce MX450 Nvidia driver version: 511.99 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=2496 DeviceID=CPU0 Family=198 L2CacheSize=10240 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2496 Name=11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] onnx==1.13.0 [pip3] onnxruntime==1.13.1 [pip3] onnxruntime-gpu==1.17.1 [pip3] onnxsim==0.4.36 [pip3] pytorch-lightning==2.2.1 [pip3] torch==2.2.1 [pip3] torchdata==0.7.1 [pip3] torchmetrics==0.8.0 [pip3] torchtext==0.17.1 [pip3] torchvision==0.17.1 [conda] Could not collect

ichu-apl commented 8 months ago

This can be fixed by just running the required function, but it lacks documentation. Either this requirement should be dropped, or the pretrained_weights + checkpoint conflict note should be removed (it doesn't look like it'd result in undefined behavior, the code comments seem to expect and account for it).

        processing_params = get_pretrained_processing_params(model_type, "coco")
        self.net.set_dataset_processing_params(**processing_params)
ichu-apl commented 8 months ago

wait, there isn't a conflict and running model.get() with both checkpoint_path and pretrained_weights solves the issue. This note in https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/model_factory.py#L225 should be removed: NOTE: Passing pretrained_weights and checkpoint_path is ill-defined and will raise an error.

james-imi commented 6 months ago

@BloodAxe this is the weirdest API i've seen to a predict method. It's not even finetuning friendly.

When exporting with preprocessing=False, it doesnt just remove the preprocessing, it also makes it so that it does not to NMS which is really a weird choice since NMS is postprocessing.

ModelHasNoPreprocessingParamsException

Moreover, you get this error asking for preprocessing when literally the documentation doesnt point to anything.

RuntimeError: You must set the dataset processing parameters before calling predict.
Please call `model.set_dataset_processing_params(...)` first.

So...

  1. You cannot disable the preprocessing for you to do the preprocessing
  2. You have to do this so it doesnt throw the error... which btw, doesnt give any results because probably it loaded stuff wrongly
    checkpoint_path=best_weights_path, pretrained_weights='coco'
  3. And you cannot also just ask it to predict an image file????
BloodAxe commented 6 months ago

@james-imi I'm not sure I'm following your complains.

Indeed, to use model.predict model must know what image preprocessing steps to make in order to apply same image resizing/padding/normalization. How model knows that? It tries to pull this meta-information from the model checkpoint. With this information missing, predict() cannot work. I hope it's all clear to this point.

How preprocessing meta-information appear in the checkpoint? During model training. It is extract from the validation dataset's transforms and saved as additional metadata next to model weights. A model.set_dataset_processing_params(...) is part of internal API and doing exactly that.

Under normal circumstances this works under the hood and one does not need any additional actions apart from normal model train / model.get step.

Throwing random pieces of code is rarely helps to address your issues. It is important to provide as much information as possible including SG version you are using and code snippet that we can use to reproduce the issue.

You can check best practices on training model, export it and using predict() by checking our example notebooks:

https://github.com/Deci-AI/super-gradients/blob/master/notebooks/yolo_nas_custom_dataset_fine_tuning_with_qat.ipynb

https://github.com/Deci-AI/super-gradients/blob/master/notebooks/YoloNAS_Pose_Fine_Tuning_Animals_Pose_Dataset.ipynb

james-imi commented 6 months ago

It tries to pull this meta-information from the model checkpoint. With this information missing, predict() cannot work. I hope it's all clear to this point.

I'm pretty sure that the complain before by another user is the SAME point. Finetuned models do not have these information if you are using your custom dataset, hence it cannot pull it out.

During model training. It is extract from the validation dataset's transforms and saved as additional metadata next to model weights. A model.set_dataset_processing_params(...) is part of internal API and doing exactly that.

Yes. And that was the same point. With custom datasets, it is not getting SAVED as additional metadata hence the issue was raised. Maybe there's a BUG in the code perhaps?

With that out in mind, your documentation says, "using preprocessing=False" also removes NMS which is very confusing since NMS is a postprocessing technique.; So I am not sure if preprocessing=False means it removes the preprocessing step (resizing, etc) and the NMS when exporting to ONNX.