AndyMcAliley commented 2 years ago

This PR implements early stopping with a user-defined patience. With early stopping implemented, training will continue until either

early_stopping_patience training epochs pass without a decrease in validation loss, or
max_epochs is reached.

These variables are defined in 3_train/in/{data_source}/{run_id}.yaml. Model weights for the epoch with the lowest validation loss (not necessarily the final training epoch) are saved.

How to run the code

These changes all concern the rule train_model in 3_train.smk. So, to run this part of the pipeline, first copy 3_train/in/example_config.yaml to 3_train/in/model_prep/1.yaml. Then:

snakemake --cores all 3_train/out/model_prep/1/a_weights.pt

That should make 3_train/out/model_prep/1/1_process.yaml, 3_train/out/model_prep/1/a_metadata.npz, 3_train/out/model_prep/1/a_weights.pt, and 3_train/out/model_prep/1/1_train.yaml.

How to review this PR

Level of review requested

The main things I'd like reviewed are:

Does the implementation of early stopping look correct?
Should any related code additions or tweaks be made?
Have any errors been introduced?
Are the comments clear?

Where in the code to focus

Anything that's been changed is fair game!

Issues that are slated for upcoming PRs (so don't worry about them yet)

Change directory structure to be more nested

jdiaz4302 commented 2 years ago

The changes look good to me

AndyMcAliley commented 2 years ago

Closes #35

DOI-USGS / lake-temperature-lstm-static

Early stopping #40

How to run the code

How to review this PR

Level of review requested

Where in the code to focus

Issues that are slated for upcoming PRs (so don't worry about them yet)