Closed CorbinFerrie closed 2 years ago
Yes! The training script saves the weights file (*.pth
) after each epoch. You can resume training by adding --net <path-to-latest-pth-file>
to your training script command.
AFAIK, this is not 100% the same as letting it run continuously, because the ADAM parameters will be re-initialized. This means that for the next few epochs after resuming, the loss will go up a bit and then go down again, but in my experience, that effect is relatively minor. I've resumed training often and never had a problem.
Thanks for the quick reply!
Hello, I was wondering if it was possible to begin training starting at a previous epoch. I am running into stability issues on my PC which causes the training script (my entire PC) to crash randomly at times. For example, if I trained up to 20 epochs and my PC crashes, is it possible to begin training from the 20th epoch vs starting over again? This would save me days of headache. Thanks