Closed hcorson-dosch-usgs closed 2 years ago
Okay, I think I've addressed all of your comments, except for adding in the glm version to the export tibble, which is being handled here #8.
@lindsayplatt note that I removed iteration='list'
for the p1_nml_objects
target b/c I realized we didn't need it as we never map over that target.
and THANK YOU for such helpful reviews -- I really enjoyed working through this today and making this pipeline better and more robust!
Oh and @jread-usgs I do think it would be good for you to run this pipeline yourself, to triple check the GLM pieces. All the files you should need to do so (for the 1_prep/in
and 1_prep/tmp
directories) are saved here
Cool - those files worked :tada:
Two quick things I ran into when running locally:
NA
s at the end of the file. I know this is a shorter example for the time being, but perhaps that munge_meteo_w_burn_in_out()
needs to fail when it doesn't have enough data to mirror and create the burn-in/out time series. I don't think NAs cause a hard fail with GLM...the simulation either stops early without an error code or just does funky things with the temperatures (I can't remember which). meteo_data %>% tail
# A tibble: 6 × 8
time Shortwave Longwave AirTemp RelHum Rain Snow WindSpeed
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1980-07-17 NA NA NA NA NA NA NA
2 1980-07-18 NA NA NA NA NA NA NA
3 1980-07-19 NA NA NA NA NA NA NA
4 1980-07-20 NA NA NA NA NA NA NA
5 1980-07-21 NA NA NA NA NA NA NA
6 1980-07-22 NA NA NA NA NA NA NA
and evap:
&debugging
disable_evap = .false.
/
should probably be set to .true. here in the template or set programmatically.
Sims right now look like they are rapidly losing water and are very cold
(from glmtools::plot_temp(file.path(sim_lake_dir,'out','output.nc'))
) but plenty of time to sort that out.
Thanks @jread-usgs -- I was wondering about that disable_evap
parameter.
And yes, the driver data periods are super short right now. I'll add a catch to that burn in, to check if the the driver data length is < the burn-in or burn-out period.
I think your hunch that NAs do funky things to the temperatures was correct. With disable_evap
set to .true
and a catch to only add burn-in/out if the requested periods are <= the length of the meteo data, the results are looking much better:
This pipeline sets up the GLM workflow laid out in #1. I did deviate slightly once I got into the actual coding.
As of now, the first part of the pipeline
1_prep
manually pulls in relevant files fromlake-temperature-model-prep
(eventually we will use globus, per #4) and builds a crosswalk (p1_meteo_xwalk
) of the meteo files, the gcms, the lake_ids, and the time periods, setting the stage for each model run, as each row = 1 model run. While Lindsay is refining the approach for building the netCDFs inlake-temperature-model-prep
(and therefore the approach we'll take to pull information out of those netCDFs in this pipeline), I'm simply mapping over the feather files that I've manually dropped into ain
folder. This part of the pipeline also builds the nml objects for each lake, using a glm3 nml template and the7_config_merge/out/nml_list.rds
fromlake-temperature-model-prep
.The second part of the pipeline
2_run
executes the model runs, mapping over the meteo xwalk. Buildingp2_glm_uncalibrated_runs
runs each model, which entails reading in the meteo data and adding a burn-in and burn-out period, writing that meteo data to the simulation directory, finalizing the nml (including the meteo filename - per #3 ) and writing it to the simulation directory, and then running the model simulation. If the simulation is successful, the temperature and ice predictions are extracted to a feather file and the simulation directory is deleted.Lindsay and I chatted re: how best to make the model execution targets (
p2_glm_uncalibrated_runs
) fault tolerant, and I ended up modifying Alison'stryCatch()
approach to wrap around aretry()
command, per the approach Lindsay used here, so that the call to execute a model is attempted up to 5 times.We also discussed whether to make the model execution targets (
p2_glm_uncalibrated_runs
) file targets or object targets, as we wanted to keep Jordan's approach of running the model and extracting the output in a single function, so that the glm simulation directories could be deleted as soon as the output was extracted. However, if model runs did fail after the 5retry()
attempts, it didn't seem logical for the function to return an empty file.After talking it through, we landed on a approach that combined Jordan's single function execution+extraction approach and Alison's approach of returning a tibble with information about the model run. The function
run_glm3_model()
writes a feather file of the model output, but rather than returning that filename, it instead returns a tibble, which includes the run_date, lake_id, gcm, time_period, the name of the export feather file, its hash (NA if the model run failed), the duration of the model run, whether or not the model run succeeded, and the code returned by the call toGLM3r::run_glm()
. This provides some nice diagnostic info, while still tracking the hash of the export feather file, and allows for a logical returned value if a model run fails:We then group this returned tibble by lake_id and gcm, and use that new grouped mapping (
p2_glm_uncalibrated_run_groups
) to build a final targetp2_glm_uncalibrated_output_feathers
that combines the output generated for a given lake using a given set of gcm driver data (per the grouping we set) into a single feather file (binding together the output from the 3 simulation time periods). If we want to change how the data is combined, we can simply modify the grouping when buildingp2_glm_uncalibrated_run_groups
. Since the hash of the export file is tracked in the tibble output fromp2_glm_uncalibrated_runs
, any change to the contents of the export file will trigger a rebuild of the combined output feather files.