Closed jordansread closed 2 years ago
First noticed with nhdhr_59752541
for 2006, 2013, and 2016. Will comment with details about other sites after I rebuild the ice_on and ice_off dates with a recent fix.
See more here: https://github.com/USGS-R/lake-temperature-out/pull/55#issuecomment-809790324
read_csv('~/Downloads/7_pb0_annual_metrics.csv') %>% filter(is.na(ice_on_date)) %>% group_by(site_id) %>% tally %>% arrange(desc(n))
# A tibble: 252 x 2
site_id n
<chr> <int>
1 nhdhr_34133487 15
2 nhdhr_91595509 15
3 nhdhr_34127903 12
4 nhdhr_120018071 10
5 nhdhr_120018100 10
6 nhdhr_34132539 10
7 nhdhr_45337397 10
8 nhdhr_91690679 10
9 nhdhr_91691877 10
10 nhdhr_120018084 9
Will start by taking a closer look at the worst of these.
Weird, I am seeing that nhdhr_34133487
has some years w/o ice, but only six, not 15. But this is a simplified cut of ice/no-ice because I am using a boolean for a given cal year.
arrow::read_feather('4_toha/out/pb0_nhdhr_34133487_temperatures_irradiance.feather') %>% mutate(year = lubridate::year(time)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice)
# A tibble: 6 x 2
year has_ice
<dbl> <lgl>
1 1987 FALSE
2 2004 FALSE
3 2005 FALSE
4 2010 FALSE
5 2011 FALSE
6 2012 FALSE
Confirmed that the exported ice flag file yields the same result
read_csv('/Volumes/ThunderBlade/HOLDER_TEMP_R/mntoha-data-release/tmp/ice_flags_12_N43.50-45.00_W93.50-96.75/pb0_nhdhr_34133487_ice_flags.csv') %>% mutate(year = lubridate::year(date)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice)
# A tibble: 6 x 2
year has_ice
<dbl> <lgl>
1 1987 FALSE
2 2004 FALSE
3 2005 FALSE
4 2010 FALSE
5 2011 FALSE
6 2012 FALSE
Yikes, this is indeed a function of using the calendar year instead of centering on the winter itself 😨 (note date + 180
)
read_csv('/Volumes/ThunderBlade/HOLDER_TEMP_R/mntoha-data-release/tmp/ice_flags_12_N43.50-45.00_W93.50-96.75/pb0_nhdhr_34133487_ice_flags.csv') %>% mutate(year = lubridate::year(date+180)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice)
# A tibble: 15 x 2
year has_ice
<dbl> <lgl>
1 1985 FALSE
2 1987 FALSE
3 1989 FALSE
4 1992 FALSE
5 1998 FALSE
6 2000 FALSE
7 2004 FALSE
8 2005 FALSE
9 2006 FALSE
10 2008 FALSE
11 2010 FALSE
12 2011 FALSE
13 2012 FALSE
14 2013 FALSE
15 2016 FALSE
Frustrating. Here is 0-depth temperature of the model from nhdhr_34133487
for 2015-06-04 - 2017-05-01
with ice (blue) overlaid.
the first winter should definitely have ice and it doesn't. This is a GLM3 modeling artifact 👎
ice_fl_cnt = 0
for (fl in names(yaml::yaml.load_file('3_run/out/toha_tasks.rds.ind'))){
ice_num <- arrow::read_feather(fl) %>% mutate(year = lubridate::year(time)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice) %>% nrow()
if (ice_num > 0){
message(fl, ' has unfrozen years')
ice_fl_cnt = ice_fl_cnt + 1
}
}
message(ice_fl_cnt, ' total files w/ issues')
initial count for current data release is 180 total files
I noodled around with the ice params and found that dt_iceon_avg = 0.02
seemed to address the issue and still create realistic ice-covers. Then I checked w/ Robert to see if he ran into the same thing, and he had a fancier way at arriving at dt_iceon_avg = 0.01
, so I am implementing that in the pipeline and re-running. Will share the comparison. But in the initial lake (nhdhr_34133487
) this took it from 15 years with no ice to zero.
Whoops, forgot to shift the time+180
in the year mutate, should be
ice_fl_cnt = 0
for (fl in names(yaml::yaml.load_file('../lake-temperature-process-models/3_run/out/toha_tasks.rds.ind'))){
ice_num <- arrow::read_feather(file.path('../lake-temperature-process-models', fl)) %>% mutate(year = lubridate::year(time+180)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice) %>% nrow()
if (ice_num > 0){
message(fl, ' has unfrozen years')
ice_fl_cnt = ice_fl_cnt + 1
}
}
message(ice_fl_cnt, ' total files w/ issues')
which will yield different results.
Re-ran with param adjustments and now we have no trouble lakes/files. Unclear on impact to RMSE, will see next.
Current RMSE (prior to param adjustment) is 2.418358°C for 784 lakes w/ enough data to calculate.
Slight improvement 🎉 2.409192
@lindsayplatt please add site_id and the year you ran into this. And any other details that may be relevant.