Open Rajesh-ParaxialTech opened 1 month ago
Hey @Rajesh-ParaxialTech ,
yes, it is possible to resume the training by exchanging the mode and specifying the new number of epochs. There are a few caveats though: (1) the learning rate schedule depends on the total number of epochs, thus overwriting the number of epochs will change the learning rate schedule (i.e. it might have been quite low towards the end of training one but will start with a higher learning rate in training two until it decreases towards the end again) (2) the training ends with SWA, which will periodically increase & decrease the learning rate before averaging the model weights. If you restart after SWA, the model will also be different than a single long training.
Best, Michael
This issue is stale because it has been open for 30 days with no activity.
Hello
Suppose i have started training an nnDetection model fixing the no of epochs to 100. Later if i want to resume training beyond 100 epochs, can i update the code and resume the training with the option mode=resume ? without loosing the weights of the model learnt during the first 100 epochs.
Thanking you
Rajesh