Flexibility in choosing which checkpoint to save during multi-head training.

ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.

Other

412 stars 155 forks source link

Current behaviour

If the errors on the mp-head increase and the default-head decrease the overall loss may increases, meaning that no checkpoint is saved. This means that eg. after 200 epochs of training, where the error on the fine-tuning dataset decreases significantly the last saved checkpoint could be eg. epoch 8.

Desired behaviour

Choose to save and select the final model based on the loss of a desired head (or a weighting of multiple heads). Such that if the error decreases on the finetuning validation set, I save this model.

Current workaround

Save checkpoints after each epoch.

ACEsuit / mace

Flexibility in choosing which checkpoint to save during multi-head training. #479