Simplify training routine

jsadler2 commented 2 years ago

@SimonTopp made some improvements to the modularity of the training routine, and that got me thinking that it could be even simpler.

I'm thinking that we could just have basically the following parameters to the train function:

compiled model
x train
x validation (optionally)
y train
y validation (optionally)
other training parameters like we already have (epochs, early stop, out_dir, etc.)

Then all the other logic could go in the Snakefile including:

compiling the model with the loss function and the optimizer (including the learning rate) (see #118)
separating out pretraining from finetuning

Feel free to share thoughts here or it maybe clearer what I'm trying to do in the coming PR.

SimonTopp commented 2 years ago

I like the way you're heading with this, I think it would also potentially translate down the road to a model agnostic repository that's a little more generalizable. Since it's come up a few times, I might add fixing the discontinuous sequences in #127 to the same update.

jsadler2 commented 2 years ago

I definitely think that #127 needs to be addressed, but I don't think it should be in this PR. I think our PRs should be able to be described in one line and have only one idea. This makes it easy to understand what we are trying to accomplish in any given PR at a glance. It also makes it clear when looking back in the git history what exactly a given PR was about. So I'm in favor of a 1) "simplify training routine" PR and 2) a separate "address non-continuous training sequences" PR, especially since they are independent components of the workflow.

USGS-R / river-dl

Simplify training routine #146