About loading checkpoint weights

WryingY commented 3 months ago

Thank you for putting forward this great project! When training models using your framework, I found some satisfying results (checkpoint folders like Fig. 1) and would like to fine-tune those models by loading weights from the checkpoint folders. However, it seems that in the trainer.py you didn't include the loading pretrained model part, and the weight files are confusing too (I'm not sure whether the file in Fig. 2 with no suffix could be loaded). It would be much appreciated if you could give insight into how the checkpoint weights are organized and how to load them in your trainer.py code :) Best Regards. 069e25a32f33f8eb3326a744a4df900 4a32f1925f218ed52f28983d2c82a94

IvanDrokin commented 3 months ago

Hey @WryingY ! Thanks for your feedback =)

Regarding your question:

Each folder contains a set of files, and the folders are saved after each epoch. So the set is structured as follows: optimizer.bin - the states of torch's optimizer randomstates* - associated random states of the process scheduler.bin - LR scheduler's states and vgg19v4-64p-tiny-imagenet is a model's state dict

So if you need to load the trained model, it could be done simply with this code snippet:

model = ... # Build the model here
state_dict = torch.load(path_to_vgg19v4-64p-tiny-imagenet_file, map_location=torch.device('cpu'))
model.load_state_dict(state_dict)

If you need to resume training from the full state, you need to add manual loading of optimizer/scheduler/random states files to trainer.py file, right after creating optimizer, scheduler and dataloaders.

please, let me know if you need more help or clarification on this topic.

WryingY commented 2 months ago

Thanks again for your speedy reply! Now I can resume my training easily with your solution. Looking forward to your fantastic work on Conv KAN in the future!

IvanDrokin / torch-conv-kan

About loading checkpoint weights #7