Closed dbuscombe-usgs closed 1 year ago
keras' model.fit options listed here include the description for initial_epoch
https://keras.io/api/models/model_training_apis/
should be a straightforward fix
the one downside I see is that the full training history for the model, currently provided in the output file ..model_history.npz
, would not be available. It is only created after successful cessation of model training. I do not see a workaround, however ....
Implemented in https://github.com/Doodleverse/segmentation_gym/commit/809466a3edf097674504fc8847f82ffc70cdc2fa
Leaving open to add to wiki docs
now added to wiki
closing
When things go afoul during model training, for example a powercut, memory leak, or other unexpected issue that interrupts lengthy training, there is currently no way to restore model training
HOT_START
could be used to restore model weights and resume training from the beginning epoch, however the LR scheduler will start again at the beginning, thus negating the point of the LR scheduler. In fact restarting the model with refined weights without modifying the LR scheduler could create unwanted model convergence issuesTo avoid this situation, the code could be modified as follows:
INITIAL_EPOCH
to the config filemodel.fit
would useINITIAL_EPOCH
as argument to theinitial_epoch
parameterHOT_START
is specified butINITIAL_EPOCH
, the program should exit with a message for the user