DOI-USGS / lake-temperature-process-models

Creative Commons Zero v1.0 Universal
1 stars 4 forks source link

Set up pipeline to run glm models and extract output #6

Closed hcorson-dosch-usgs closed 2 years ago

hcorson-dosch-usgs commented 2 years ago

This pipeline sets up the GLM workflow laid out in #1. I did deviate slightly once I got into the actual coding.

As of now, the first part of the pipeline 1_prep manually pulls in relevant files from lake-temperature-model-prep (eventually we will use globus, per #4) and builds a crosswalk (p1_meteo_xwalk) of the meteo files, the gcms, the lake_ids, and the time periods, setting the stage for each model run, as each row = 1 model run. While Lindsay is refining the approach for building the netCDFs in lake-temperature-model-prep (and therefore the approach we'll take to pull information out of those netCDFs in this pipeline), I'm simply mapping over the feather files that I've manually dropped into a in folder. This part of the pipeline also builds the nml objects for each lake, using a glm3 nml template and the 7_config_merge/out/nml_list.rds from lake-temperature-model-prep.

The second part of the pipeline 2_run executes the model runs, mapping over the meteo xwalk. Building p2_glm_uncalibrated_runs runs each model, which entails reading in the meteo data and adding a burn-in and burn-out period, writing that meteo data to the simulation directory, finalizing the nml (including the meteo filename - per #3 ) and writing it to the simulation directory, and then running the model simulation. If the simulation is successful, the temperature and ice predictions are extracted to a feather file and the simulation directory is deleted.

Lindsay and I chatted re: how best to make the model execution targets (p2_glm_uncalibrated_runs) fault tolerant, and I ended up modifying Alison's tryCatch() approach to wrap around a retry() command, per the approach Lindsay used here, so that the call to execute a model is attempted up to 5 times.

We also discussed whether to make the model execution targets (p2_glm_uncalibrated_runs) file targets or object targets, as we wanted to keep Jordan's approach of running the model and extracting the output in a single function, so that the glm simulation directories could be deleted as soon as the output was extracted. However, if model runs did fail after the 5 retry() attempts, it didn't seem logical for the function to return an empty file.

After talking it through, we landed on a approach that combined Jordan's single function execution+extraction approach and Alison's approach of returning a tibble with information about the model run. The function run_glm3_model() writes a feather file of the model output, but rather than returning that filename, it instead returns a tibble, which includes the run_date, lake_id, gcm, time_period, the name of the export feather file, its hash (NA if the model run failed), the duration of the model run, whether or not the model run succeeded, and the code returned by the call to GLM3r::run_glm(). This provides some nice diagnostic info, while still tracking the hash of the export feather file, and allows for a logical returned value if a model run fails: image

We then group this returned tibble by lake_id and gcm, and use that new grouped mapping (p2_glm_uncalibrated_run_groups) to build a final target p2_glm_uncalibrated_output_feathers that combines the output generated for a given lake using a given set of gcm driver data (per the grouping we set) into a single feather file (binding together the output from the 3 simulation time periods). If we want to change how the data is combined, we can simply modify the grouping when building p2_glm_uncalibrated_run_groups. Since the hash of the export file is tracked in the tibble output from p2_glm_uncalibrated_runs, any change to the contents of the export file will trigger a rebuild of the combined output feather files.

hcorson-dosch-usgs commented 2 years ago

Okay, I think I've addressed all of your comments, except for adding in the glm version to the export tibble, which is being handled here #8.

@lindsayplatt note that I removed iteration='list' for the p1_nml_objects target b/c I realized we didn't need it as we never map over that target.

hcorson-dosch-usgs commented 2 years ago

and THANK YOU for such helpful reviews -- I really enjoyed working through this today and making this pipeline better and more robust!

hcorson-dosch-usgs commented 2 years ago

Oh and @jread-usgs I do think it would be good for you to run this pipeline yourself, to triple check the GLM pieces. All the files you should need to do so (for the 1_prep/in and 1_prep/tmp directories) are saved here

jordansread commented 2 years ago

Cool - those files worked :tada:

Two quick things I ran into when running locally:

meteo_data %>% tail
# A tibble: 6 × 8
  time       Shortwave Longwave AirTemp RelHum  Rain  Snow WindSpeed
  <date>         <dbl>    <dbl>   <dbl>  <dbl> <dbl> <dbl>     <dbl>
1 1980-07-17        NA       NA      NA     NA    NA    NA        NA
2 1980-07-18        NA       NA      NA     NA    NA    NA        NA
3 1980-07-19        NA       NA      NA     NA    NA    NA        NA
4 1980-07-20        NA       NA      NA     NA    NA    NA        NA
5 1980-07-21        NA       NA      NA     NA    NA    NA        NA
6 1980-07-22        NA       NA      NA     NA    NA    NA        NA

and evap:

&debugging
   disable_evap = .false.
/

should probably be set to .true. here in the template or set programmatically.

Sims right now look like they are rapidly losing water and are very cold image (from glmtools::plot_temp(file.path(sim_lake_dir,'out','output.nc'))) but plenty of time to sort that out.

hcorson-dosch-usgs commented 2 years ago

Thanks @jread-usgs -- I was wondering about that disable_evap parameter.

And yes, the driver data periods are super short right now. I'll add a catch to that burn in, to check if the the driver data length is < the burn-in or burn-out period.

hcorson-dosch-usgs commented 2 years ago

I think your hunch that NAs do funky things to the temperatures was correct. With disable_evap set to .true and a catch to only add burn-in/out if the requested periods are <= the length of the meteo data, the results are looking much better: image