DOI-USGS / lake-temperature-process-models

Creative Commons Zero v1.0 Universal
1 stars 4 forks source link

Add dependency to evaluation pred-obs matching #78

Closed hcorson-dosch-usgs closed 2 years ago

hcorson-dosch-usgs commented 2 years ago

This PR fixes a bug in the 5_evaluate.R step that we had not previously identified.

The bug

When the evaluation code was first written, we read in all GLM predictions and then filtered to only predictions for sites with observations. This was too expensive an approach at scale, so we switched to reading in the predictions within the step to match predictions to observations. The targets call to the matching function was set up such that we used a site id to reconstruct the export filepath. Unfortunately, with this approach, targets was not tracking the model output files as a dependency of the matched pred-obs target. So long as the set of sites for which we had observations and predictions didn't change, this target didn't rebuild, which meant any changed model output was not used for evaluation.

I first noticed this behavior when rebuilding the p5_nldas steps following a complete re-run of the NLDAS models. Despite there being new model output for several sites, targets skipped the p5_nldas_pred_obs target. I confirmed the behavior locally by rebuilding GCM models for several sites with outdated GCM data then re-running the evaluation steps -- again, the evaluation pred-obs target (p5_gcm_pred_obs) didn't rebuild.

The fix

The fix is to filter the p3_{driver_type}_glm_uncalibrated_output_feather_tibble to only those sites for which we have observations and predictions, then use this filtered dataframe to provide (and thereby track) the export filepaths for each site, or in the case of the GCM models, for each site and GCM.

This PR also fixes a typo where p5_nldas_eval_depths was referencing gcm targets. This code had not impacted any builds, as it was part of the extrapolate preds PR and only as far as p5_nldas_pred_obs_csv had been rebuilt.