If the errors on the mp-head increase and the default-head decrease the overall loss may increases, meaning that no checkpoint is saved. This means that eg. after 200 epochs of training, where the error on the fine-tuning dataset decreases significantly the last saved checkpoint could be eg. epoch 8.
Desired behaviour
Choose to save and select the final model based on the loss of a desired head (or a weighting of multiple heads). Such that if the error decreases on the finetuning validation set, I save this model.
@LarsSchaaf Good point, I made a change yesterday to save only based on the last head (which is the finetuning head) loss but I agree more flexibility should be welcomed.
Current behaviour
If the errors on the mp-head increase and the default-head decrease the overall loss may increases, meaning that no checkpoint is saved. This means that eg. after 200 epochs of training, where the error on the fine-tuning dataset decreases significantly the last saved checkpoint could be eg. epoch 8.
Desired behaviour
Choose to save and select the final model based on the loss of a desired head (or a weighting of multiple heads). Such that if the error decreases on the finetuning validation set, I save this model.
Current workaround
Save checkpoints after each epoch.