Closed hcorson-dosch-usgs closed 2 years ago
After discussion, Hayley and I decided to take approach 1: interpolating predictions to match observation depths. Here's why:
As shared via chat from Hayley on 7/15, she has changed the method in a branch on her fork here https://github.com/hcorson-dosch/lake-temperature-process-models/blob/extrapolate_preds/5_evaluate/src/eval_utility_fxns.R#L61-L77. I ran tar_make_clustermq(p5_nldas_pred_obs_csv, workers=50)
on her hayley_extrapolate_preds
branch on Tallgrass today (it took ~ 2 hrs to run). Then, compared the previous matching output, 5_evaluate/out/NLDAS_matched_to_observations_PredsNotExtrapolated.csv
, to the most recent build. The following are the differences
# Compare the matching preds to obs methods.
library(tidyverse)
match_old <- readr::read_csv('5_evaluate/out/NLDAS_matched_to_observations_PredsNotExtrapolated.csv')
match_new <- readr::read_csv('5_evaluate/out/NLDAS_matched_to_observations.csv')
# How many more obvservations are kept?
nrow(match_new) - nrow(match_old)
[1] 19372
match_old_site <- match_old %>% group_by(site_id) %>% summarize(n_old = n())
match_new_site <- match_new %>% group_by(site_id) %>% summarize(n_new = n())
match_compare_site <- full_join(match_old_site, match_new_site) %>%
mutate(diff = n_new - n_old) %>%
filter(diff != 0)
# Across how many sites were the new observations spread?
# For context, there were 3688 sites total
nrow(match_compare_site)
[1] 1609
# On average, there are 12 new observations per site
#
summary(match_compare_site)
site_id n_old n_new diff
Length:1609 Min. : 7.0 Min. : 12 Min. : 1.00
Class :character 1st Qu.: 118.0 1st Qu.: 123 1st Qu.: 2.00
Mode :character Median : 247.0 Median : 255 Median : 4.00
Mean : 793.9 Mean : 806 Mean : 12.04
3rd Qu.: 647.0 3rd Qu.: 671 3rd Qu.: 9.00
Max. :63268.0 Max. :63271 Max. :838.00
# What's that site with a crazy number of additional new obs?
match_compare_site %>% filter(diff > 700)
# A tibble: 1 × 4
site_id n_old n_new diff
<chr> <int> <int> <int>
1 nhdhr_45385910 6675 7513 838
plot(match_compare_site$diff)
Thanks so much for wrapping up that work and writing this summary, Lindsay! Should I go ahead and PR that branch, so that this method is used when we complete the updated GCM runs?
Oh, yes! Please do! Watch out because the current process-models
is on my branch for my 1.5 m/s test. You may need to git stash
first before switching branches.
See Jordan's note on the NLDAS surface evaluation PR here
Our team is currently using two approaches to match predictions to observations: 1) Interpolate predictions to the depths of observations. The
5_evaluate
portion of this pipeline uses this method, based on Alison's depth-matching code that was in the mntoha data release.2) Assign observations to the nearest depth bin and match predictions to observations based on those depth bins. This is the approach used by Andy in the
lake-temperature-lstm-static
repoOnce we decide on an approach, the same approach should be use across both workflows.