USGS-R / river-dl

Deep learning model for predicting environmental variables on river systems
Creative Commons Zero v1.0 Universal
21 stars 15 forks source link

146 simplify training #148

Closed jsadler2 closed 2 years ago

jsadler2 commented 2 years ago

Warning: This is a breaking PR

This moves the definition of the tensorflow model out of train.py. If agreed upon, this means you will no longer be able to pass a string as an model_type argument (e.g., 'rgcn') to the train_model function. You would instead pass a model object (river_dl.RGCN.RGCNModel).

Pros The main pro is that train.py is much lighter and more flexible. This is mainly because we don't have to handle all of possible model_type arguments passed as strings via if else statements. For example, there are now no gw specific pieces in train.py

We also leave the defining of "pretraining" and "finetuning" out of train.py. This means that you could have any number of training phases with different data/epochs/loss functions etc. This also means that you can define your model anywhere (e.g., my_awesome_model.py) import it into whatever file you are using to call train_model, instantiate it, and pass the object into train_model.

Tradeoff The tradeoff is that the model has to be defined and compiled with its loss and optimizer somewhere else - so the burden is more on the individual projects (e.g., in the Snakefile). I edited the Snakefile to show what that would look like there.


Summary This PR is intended to make train.py project agnostic; this means whatever calls train.py (e.g., Snakefile) has the burden of project-specific/model definitions.

closes #146, #118

jsadler2 commented 2 years ago

Thanks for the review and thoughts, @SimonTopp. I think you hit the nail on the head here:

it puts the onus on the user to configure the Snakefile rather than just setting some arguments in the config.yml and hitting go.

This change definitely puts more of the onus on the modeler/user. And I think it's still an open question whether that is the direction we want to go.

For fun I just sketched out where along the "flexibility - helpfulness" spectrum I see this PR.

PXL_20211207_191839010 MP

I think the upside of flexibility we gain is pretty nice. True we are doing less of the work for a given user/modeler, that said, I think that sometimes we need the ability to make custom models and not having to figure out how to plug them into a rigid (but helpful when plugged in) code base can be pretty freeing. And knowing how to instantiate a model object is pretty powerful and actually not too hard.

The one thing that gives me pause is needing to know better how to manipulate the Snakefile. And maybe that's where you and Janet's idea comes in:

we've chatted about creating a handful of example Snakefiles (e.g. baseline run, running replicates, with and without pre-training, etc) and explicitely stating that the config.yml and Snakefile in the repo aren't canonical and should only be used as reference.

This could be the happy medium where we provide a more flexible tool but also show people how to use it for their own application.

jsadler2 commented 2 years ago

@janetrbarclay mentioned that it would be ideal if (paraphrasing here) if we can add the new functionality without losing the existing functionality. I think that is wise. I think there is a pretty straightforward way to do that. I will revise this PR to do that.

jsadler2 commented 2 years ago

@janetrbarclay and @SimonTopp - Thank you guys for your review comments. I think the biggest change is just to shift our thinking from the Snakefile/config.yml files from being "this is the way to use river-dl" to "this is an example for how one could use river-dl" but you will likely have to modify for your own purposes. I will make some edits to this PR to address that and your guys' comments.

jsadler2 commented 2 years ago

@janetrbarclay and @SimonTopp - this is ready for another look. I summarize the major changes as:

  1. training and prediction functions takes a compiled tensorflow model
  2. the Snakefiles and config files are now in their own directory (workflow_examples/). Also in that directory is a readme that describes the different examples
  3. the asRunConfig function now takes the code directory as an argument (since we are no longer assuming the Snakefile one is using is located in the root river-dl directory.