Open Lilferrit opened 3 weeks ago
My implementation is on the branch
val-early-stop
. I also changed the best validation checkpoint filename from<root>.best.ckpt
to<root>.<epoch>-<step>.best.ckpt
. If we want to implement add the early stopping feature, but we don't want to change the best filename, I can remove this before submitting a PR.
I don't think that this is an ideal change. The reasoning behind the best.ckpt
file was that its filename would always be the same, so that the user can immediately get it. Adding the epoch number removes this advantage.
While adding the early stopping patience is a small change that can make training a bit more convenient, one thing to make sure in your implementation is that it is defined in terms of the number of training steps, not epochs. When we're training on the full MassIVE-KB data, there is convergence even before a full epoch has been processed. Hence also why val_check_interval
and some other training options are defined in terms of the number of steps.
This is another QOL feature I implemented for the sake of my own experiments, but that might be nice add to the mainline Casanovo release. I added a new config option
val_patience_interval
that defaults to -1 (to mirror the functionality ofmax_epochs
), but ifval_patience_interval
is set to a positive value then an early stopping callback is added to the model runner using PyLightning'sEarlyStopping
callback. This callback will monitorvalid_CELoss
and will stop model training if thevalid_CELoss
doesn't improve forval_patience_interval
.My implementation is on the branch
val-early-stop
. I also changed the best validation checkpoint filename from<root>.best.ckpt
to<root>.<epoch>-<step>.best.ckpt
. If we want to implement add the early stopping feature, but we don't want to change the best filename, I can remove this before submitting a PR.