Nldas surface eval - Githubissues

Code additions to evaluate the bias and accuracy of NLDAS surface predictions by these groupings: year, doy, season, and 2-degree temperature bins.

All key changes are in 5_evaluate.R and 5_evaluate/src/eval_utility_fxns.R

For the current MN run (3,569 lakes) there are 2,174 sites that had >10 dates with observations. This evaluation is for surface predictions for those 2,174 lakes.

@lindsayplatt - I'm tagging you for a quick review of my edits to the target workflow presented in #55, namely the edits to pulling together the preds for eval and my use of tar_assert() for that target and for the pred-obs matching target.

After initially planning to make use of the predictions that had already been read in when extracting the output, I realized when testing how the code scaled that it was too expensive (in terms of build time), as it required filtering that extracted output back down to the sites and dates for which we have observations. Instead, it was faster computationally to simply read in the predictions from the exported files for all sites for which we have observations and filter to dates w/ observations as the predictions are read in.

@jread-usgs - I'm tagging you to review the functions in 5_evaluate/src/eval_utility_fxns.R, the accuracy and bias metrics I've included, and the plots that are created.

The plots produced by these targets are here. Here's one example: nldas_surface_accuracy_doy

If the existing code look good, I'd like to propose adding a bit more to the plots by either a) symbolizing the bars by n_dates / n_sites (color ramp) or b) pairing each bar plot w/ a histogram of n_dates and n_sites that share the same x-axis as the main plot, as a spot check reveals that at least some of the high rmse/bias values correspond to bins w/ few dates and/or sites:

There is considerable variability in the # of dates and sites per grouped bin (here DOY):

> tar_load(p5_nldas_surface_accuracy_doy)
> median(p5_nldas_surface_accuracy_doy$n_dates)
[1] 178
> max(p5_nldas_surface_accuracy_doy$n_dates)
[1] 1177
> min(p5_nldas_surface_accuracy_doy$n_dates)
[1] 1
> median(p5_nldas_surface_accuracy_doy$n_sites)
[1] 138.5
> max(p5_nldas_surface_accuracy_doy$n_sites)
[1] 780
> min(p5_nldas_surface_accuracy_doy$n_sites)
[1] 1

DOI-USGS / lake-temperature-process-models

Nldas surface eval #56