This PR implements early stopping with a user-defined patience. With early stopping implemented, training will continue until either
early_stopping_patience training epochs pass without a decrease in validation loss, or
max_epochs is reached.
These variables are defined in 3_train/in/{data_source}/{run_id}.yaml. Model weights for the epoch with the lowest validation loss (not necessarily the final training epoch) are saved.
How to run the code
These changes all concern the rule train_model in 3_train.smk. So, to run this part of the pipeline, first copy 3_train/in/example_config.yaml to 3_train/in/model_prep/1.yaml. Then:
snakemake --cores all 3_train/out/model_prep/1/a_weights.pt
That should make 3_train/out/model_prep/1/1_process.yaml, 3_train/out/model_prep/1/a_metadata.npz, 3_train/out/model_prep/1/a_weights.pt, and 3_train/out/model_prep/1/1_train.yaml.
How to review this PR
Level of review requested
The main things I'd like reviewed are:
Does the implementation of early stopping look correct?
Should any related code additions or tweaks be made?
Have any errors been introduced?
Are the comments clear?
Where in the code to focus
Anything that's been changed is fair game!
Issues that are slated for upcoming PRs (so don't worry about them yet)
This PR implements early stopping with a user-defined patience. With early stopping implemented, training will continue until either
early_stopping_patience
training epochs pass without a decrease in validation loss, ormax_epochs
is reached.These variables are defined in
3_train/in/{data_source}/{run_id}.yaml
. Model weights for the epoch with the lowest validation loss (not necessarily the final training epoch) are saved.How to run the code
These changes all concern the rule
train_model
in3_train.smk
. So, to run this part of the pipeline, first copy3_train/in/example_config.yaml
to3_train/in/model_prep/1.yaml
. Then:That should make
3_train/out/model_prep/1/1_process.yaml
,3_train/out/model_prep/1/a_metadata.npz
,3_train/out/model_prep/1/a_weights.pt
, and3_train/out/model_prep/1/1_train.yaml
.How to review this PR
Level of review requested
The main things I'd like reviewed are:
Where in the code to focus
Anything that's been changed is fair game!
Issues that are slated for upcoming PRs (so don't worry about them yet)