USGS-R / drb-estuary-salinity-ml

Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

03b modeling steps cleaner #117

Closed galengorski closed 1 year ago

galengorski commented 1 year ago

This is now ready for review, I have tested it in a newly built environment and there are a few known issues, but it should run ok. Since this PR got a little bit too big, there are a lot of files. I would appreciate feedback on the run_model.py script as that is the newest and most important in the PR.

Steps for running the model:

  1. clone this branch
  2. add the file 953860.zip into the 01_fetch/in folder, the file can be found on S3 in drb_estuary_salinity/01_fetch/in
  3. from within the github cloned directory create the environment using conda env create -f environment.yaml
  4. run snakemake -s Snakefile_fetch_munge -j (-j runs the job on the available number of cpus cores, use -j 2 for fewer)
  5. you might have to rerun the same command if there is an error that pops up, this is because snakemake doesn't run rules in order and some directories need to be created
  6. now open the file 03b_model/model_config.yaml and change n_epochs to a small number say 5, and change the run_id to whatever you want to name the test run, say Test_Run
  7. run snakemake -s Snakefile_b_ml_model_baseline -j, you should see the training progress in the command window and you should have model results written to 03b_model/out/Test_Run/
galengorski commented 1 year ago

If it’s not already, I think your run instructions could be in a README. Here are some possible edits to consider:

Thanks I'll create a README with this info

  • I already had the repo cloned with older packages in the environment. I used this to update the environment and remove old/outdated dependencies : conda env update --file environment.yaml –prune
  • It looks like the Snakefile_fetch_munge run worked, but I do see the following error. Should I be on VPN to run this?
Using IDP Account default to access ADFS https://fs.doi.gov/
Authenticating as <user>@usgs.gov ...
error authenticating to IdP: unable to classify response from auth server

I get this error whether I am on VPN or not, would be good to investigate

  • Seems like river_dl is missing from the environment. Let me know how that should be added. ModuleNotFoundErrorin line 4 of Snakefile_b_ml_model_baseline: No module named 'river_dl'

Good catch, you can run git submodule add https://github.com/USGS-R/river-dl.git from the 03b_model/src/ directory. In the future this can be done during repo cloning with the --recurse-submodules command, which I will put in the README file as well.