DOI-USGS / lake-temperature-process-models-old

Pipeline #2
Other
0 stars 6 forks source link

Dig into GLM3 failing to form ice on MN lake(s) #30

Closed jordansread closed 2 years ago

jordansread commented 3 years ago

@lindsayplatt please add site_id and the year you ran into this. And any other details that may be relevant.

lindsayplatt commented 3 years ago

First noticed with nhdhr_59752541 for 2006, 2013, and 2016. Will comment with details about other sites after I rebuild the ice_on and ice_off dates with a recent fix.

image

jordansread commented 3 years ago

See more here: https://github.com/USGS-R/lake-temperature-out/pull/55#issuecomment-809790324

read_csv('~/Downloads/7_pb0_annual_metrics.csv') %>% filter(is.na(ice_on_date)) %>% group_by(site_id) %>% tally %>% arrange(desc(n))

# A tibble: 252 x 2
   site_id             n
   <chr>           <int>
 1 nhdhr_34133487     15
 2 nhdhr_91595509     15
 3 nhdhr_34127903     12
 4 nhdhr_120018071    10
 5 nhdhr_120018100    10
 6 nhdhr_34132539     10
 7 nhdhr_45337397     10
 8 nhdhr_91690679     10
 9 nhdhr_91691877     10
10 nhdhr_120018084     9

Will start by taking a closer look at the worst of these.

jordansread commented 3 years ago

Weird, I am seeing that nhdhr_34133487 has some years w/o ice, but only six, not 15. But this is a simplified cut of ice/no-ice because I am using a boolean for a given cal year.

arrow::read_feather('4_toha/out/pb0_nhdhr_34133487_temperatures_irradiance.feather') %>% mutate(year = lubridate::year(time)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice)
# A tibble: 6 x 2
   year has_ice
  <dbl> <lgl>  
1  1987 FALSE  
2  2004 FALSE  
3  2005 FALSE  
4  2010 FALSE  
5  2011 FALSE  
6  2012 FALSE  
jordansread commented 3 years ago

Confirmed that the exported ice flag file yields the same result

read_csv('/Volumes/ThunderBlade/HOLDER_TEMP_R/mntoha-data-release/tmp/ice_flags_12_N43.50-45.00_W93.50-96.75/pb0_nhdhr_34133487_ice_flags.csv') %>% mutate(year = lubridate::year(date)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice)
# A tibble: 6 x 2
   year has_ice
  <dbl> <lgl>  
1  1987 FALSE  
2  2004 FALSE  
3  2005 FALSE  
4  2010 FALSE  
5  2011 FALSE  
6  2012 FALSE  
jordansread commented 3 years ago

Yikes, this is indeed a function of using the calendar year instead of centering on the winter itself 😨 (note date + 180)

read_csv('/Volumes/ThunderBlade/HOLDER_TEMP_R/mntoha-data-release/tmp/ice_flags_12_N43.50-45.00_W93.50-96.75/pb0_nhdhr_34133487_ice_flags.csv') %>% mutate(year = lubridate::year(date+180)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice)

# A tibble: 15 x 2
    year has_ice
   <dbl> <lgl>  
 1  1985 FALSE  
 2  1987 FALSE  
 3  1989 FALSE  
 4  1992 FALSE  
 5  1998 FALSE  
 6  2000 FALSE  
 7  2004 FALSE  
 8  2005 FALSE  
 9  2006 FALSE  
10  2008 FALSE  
11  2010 FALSE  
12  2011 FALSE  
13  2012 FALSE  
14  2013 FALSE  
15  2016 FALSE  
jordansread commented 3 years ago

image

Frustrating. Here is 0-depth temperature of the model from nhdhr_34133487 for 2015-06-04 - 2017-05-01 with ice (blue) overlaid.

the first winter should definitely have ice and it doesn't. This is a GLM3 modeling artifact 👎

jordansread commented 3 years ago
ice_fl_cnt = 0
for (fl in names(yaml::yaml.load_file('3_run/out/toha_tasks.rds.ind'))){
    ice_num <- arrow::read_feather(fl) %>% mutate(year = lubridate::year(time)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice) %>% nrow()
    if (ice_num > 0){
        message(fl, ' has unfrozen years')
        ice_fl_cnt = ice_fl_cnt + 1
    }
}
message(ice_fl_cnt, ' total files w/ issues')

initial count for current data release is 180 total files

jordansread commented 3 years ago

I noodled around with the ice params and found that dt_iceon_avg = 0.02 seemed to address the issue and still create realistic ice-covers. Then I checked w/ Robert to see if he ran into the same thing, and he had a fancier way at arriving at dt_iceon_avg = 0.01, so I am implementing that in the pipeline and re-running. Will share the comparison. But in the initial lake (nhdhr_34133487) this took it from 15 years with no ice to zero.

jordansread commented 3 years ago

Whoops, forgot to shift the time+180 in the year mutate, should be

ice_fl_cnt = 0
for (fl in names(yaml::yaml.load_file('../lake-temperature-process-models/3_run/out/toha_tasks.rds.ind'))){
    ice_num <- arrow::read_feather(file.path('../lake-temperature-process-models', fl)) %>% mutate(year = lubridate::year(time+180)) %>% group_by(year) %>% summarise(has_ice = any(ice)) %>% filter(!has_ice) %>% nrow()
    if (ice_num > 0){
        message(fl, ' has unfrozen years')
        ice_fl_cnt = ice_fl_cnt + 1
    }
}
message(ice_fl_cnt, ' total files w/ issues')

which will yield different results.

jordansread commented 3 years ago

Re-ran with param adjustments and now we have no trouble lakes/files. Unclear on impact to RMSE, will see next.

Current RMSE (prior to param adjustment) is 2.418358°C for 784 lakes w/ enough data to calculate.

jordansread commented 3 years ago

Slight improvement 🎉 2.409192