ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
412 stars 155 forks source link

Flexibility in choosing which checkpoint to save during multi-head training. #479

Open LarsSchaaf opened 1 week ago

LarsSchaaf commented 1 week ago

Current behaviour

If the errors on the mp-head increase and the default-head decrease the overall loss may increases, meaning that no checkpoint is saved. This means that eg. after 200 epochs of training, where the error on the fine-tuning dataset decreases significantly the last saved checkpoint could be eg. epoch 8.

Desired behaviour

Choose to save and select the final model based on the loss of a desired head (or a weighting of multiple heads). Such that if the error decreases on the finetuning validation set, I save this model.

Current workaround

Save checkpoints after each epoch.

ilyes319 commented 1 week ago

@LarsSchaaf Good point, I made a change yesterday to save only based on the last head (which is the finetuning head) loss but I agree more flexibility should be welcomed.