Closed adilraja closed 3 months ago
Hello, let us see what error you got while loading the model from the disk. Also, consider leaving your env details such OS, package versions, etc.
Hi all, I get this error when I try to do torch.load('yolo_nas.l.pt') from the local disk. I wonder if you can help me with this.
AttributeError: 'YoloNASBottleneck' object has no attribute 'drop_path
Best regards, Dr. Muhammad Adil Raja Postdoctoral researcher Regulated Software Research Centre (RSRC) Dundalk Institute of Technology (DkIT) Ireland
On Mon, Aug 28, 2023 at 6:17 AM bit-scientist @.***> wrote:
Hello, let us see what error you got while loading the model from the disk. Also, consider leaving your env details such OS, package versions, etc.
— Reply to this email directly, view it on GitHub https://github.com/Deci-AI/super-gradients/issues/1423#issuecomment-1695024877, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADS3RIRGI7JAZIFY7UE4WW3XXQSYNANCNFSM6AAAAAA4A3QQKI . You are receiving this because you authored the thread.Message ID: @.***>
Can you please provide the exact code you are using to load the model from disk?
It looks like you are doing something wrong, as loading checkpoints from file is indeed supported and described here. Check the checkpoint_path
. Note that you need to pass both num_classes
and checkpoint_path
simultaneously.
Hi Eugene, Many thanks for this. Here is my code:
import os import torch
from super_gradients.training import models from super_gradients.training import Trainer from super_gradients.training.losses import PPYoloELoss from super_gradients.training.metrics import DetectionMetrics_050 from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback
HOME = os.getcwd() print(HOME)
DEVICE = 'cuda' if torch.cuda.is_available() else "cpu" print("The device is: ", DEVICE) print(DEVICE) MODEL_ARCH = 'yolo_nas_l'
model = torch.load('yolo_nas_l.pt', map_location=torch.device(DEVICE))
.... (And then I do:)
trainer.train( model=model, training_params=train_params, train_loader=train_data, valid_loader=val_data )
And this is where the error occurs finally.
By the way, thanks indeed for letting me know that checkpoints from file is indeed supported and described here https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.model_factory.get. Check the checkpoint_path. Note that you need to pass both num_classes and checkpoint_path simultaneously.
I will be sure to check this. But I have an impression that through the checkpint_path we can load a .pth file. I want to load the .pt file. I wonder if they are any different.
Best regards, Dr. Muhammad Adil Raja Postdoctoral researcher Regulated Software Research Centre (RSRC) Dundalk Institute of Technology (DkIT) Ireland
On Mon, Aug 28, 2023 at 7:54 AM Eugene Khvedchenya @.***> wrote:
Can you please provide the exact code you are using to load the model from disk? It looks like you are doing something wrong, as loading checkpoints from file is indeed supported and described here https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.model_factory.get. Check the checkpoint_path. Note that you need to pass both num_classes and checkpoint_path simultaneously.
— Reply to this email directly, view it on GitHub https://github.com/Deci-AI/super-gradients/issues/1423#issuecomment-1695133198, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADS3RITSEUJKHJBSUFGBIFDXXQ6CPANCNFSM6AAAAAA4A3QQKI . You are receiving this because you authored the thread.Message ID: @.***>
Hi Eugene, Ok so I ran my script according to your advice and I got the following error trace.
[2023-08-28 12:29:48] INFO - crash_tips_setup.py - Crash tips is enabled. You can set your environment variable to CRASH_HANDLER=FALSE to disable it [2023-08-28 12:29:52] WARNING - init.py - Failed to import pytorch_quantization /ichec/work/dkcom001c/conda/yolonas/lib/python3.10/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2023-08-28 12:29:56] WARNING - calibrator.py - Failed to import pytorch_quantization [2023-08-28 12:29:56] WARNING - export.py - Failed to import pytorch_quantization [2023-08-28 12:29:56] WARNING - selective_quantization_utils.py - Failed to import pytorch_quantization Traceback (most recent call last): File "/ichec/home/users/madil/yolonas/yolonastrainer.py", line 27, in
Note that you need to pass both
num_classes
andcheckpoint_path
simultaneously.
Hi! (jumping in on this thread.) The solution is straight away known in passing num_classes
, however, I am wondering if it is necessary? After training a model initialized on e.g. COCO, I load the average_model.pth
including the num_classes
integer, yet when I make a prediction on a random image, the output of the model contains correctly the property class_names
. This suggests that the pth
file has stored a string of the classes, and by proxy - the length/count? Or am I missing something that is being re-used in the script.
Hi Daniel, I am not talking about further training a pth model for the same kind of data. I am talking about retraining a pt model for different type of data. But I also don't know about the differences between the pth and pt models.
Sent from Phone
On Tue, 29 Aug 2023, 2:23 pm Daniel Angelov, @.***> wrote:
Note that you need to pass both num_classes and checkpoint_path simultaneously.
Hi! (jumping in on this thread.) The solution is straight away known in passing num_classes, however, I am wondering if it is necessary? After training a model initialized on e.g. COCO, I load the average_model.pth including the num_classes integer, yet when I make a prediction on a random image, the output of the model contains correctly the property class_names. This suggests that the pth file has stored a string of the classes, and by proxy - the length/count? Or am I missing something that is being re-used in the script.
— Reply to this email directly, view it on GitHub https://github.com/Deci-AI/super-gradients/issues/1423#issuecomment-1697439014, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADS3RIUBWTQZXYLRTIEWYR3XXXUMPANCNFSM6AAAAAA4A3QQKI . You are receiving this because you authored the thread.Message ID: @.***>
It seems to me that you are trying to load a chackpoint you have created on your own with the models.get().
It is quite hard to know what went wrong when that is the case. We do however, test that checkpoints created throughout the training in SG can be loaded that way in our unit tests. I recommend going over our checkpoints docs section here. I will close this issue for now, and if further problems exist please feel free to re-open it.
🚀 Feature Request
Hi, I am trying to train yolo_nas_l.pt etc on a custom project. Somehow I have to load the model from the disk and I can't use super_gradients.training.models.get(...) to get the model from the web in my script which I have to run on a remote server. The remote server does not allow me to fetch the model from the Internet through a python script. So I have to have it on the disk. I am wondering if there is a method in super_gradients that would allow me to load a .pt model from the disk? I tried to run torch.load(...) but that failed with an error code. So it would be nice to have this feature in supergradients.
Best regards, Muhammad Adil Raja
Proposed Solution (Optional)
Perhaps it would be nice to have a get function which loads model from the disk with a slightly different signature.