DOI-USGS / lake-temperature-process-models-old

Pipeline #2
Other
0 stars 6 forks source link

Examine outlier transfer modeling results #23

Open jordansread opened 4 years ago

jordansread commented 4 years ago

We are looking at the 4 worst transfer models, based on comparisons to pb0, which don't see any data.

nhdhr_45730856 is Lake Union in MN which is small but seems to have an inflow and outflow. Looks like wetland-like inflow/outflows.

I am looking at PG-MTL vs observations, where I styled the observations markers based on source, since there are three sources for these obs and I wanted to see if one of them was funky.

I've picked out five years here that have plenty of observations. In 95 and 98 the observations are quite cold compared to the model, with the surface temperatures in 98 being really cold compared to what I'd expect in a central MN lake in the middle of the summer. Observations in the 2011, 2016, and 2017 periods seem more "normal" and also the model looks better

library(tidyverse)
this_site_id <- 'nhdhr_45730856'

mod_data <- feather::read_feather('~/Downloads/bad_outputs/target_nhdhr_45730856_top_source_nhdhr_32672122_outputs.feather') %>% 
  pivot_longer(-index, names_to = 'depth', values_to = 'temp', names_prefix = 'Depth_') %>% filter(!is.na(temp)) %>% 
  mutate(date = as.Date(index), type = 'PG_MTL', source = NA) %>% select(date, depth, temp, type, source)

plot_data <- read_csv('2_prep/in/03_temperature_observations.csv') %>% filter(site_id == this_site_id ) %>% 
  mutate(type = 'obs') %>% select(date, depth, temp, type, source) %>% 
  rbind(mod_data, .) %>% mutate(source = factor(source), depth = factor(depth), year = factor(lubridate::year(date))) %>% 
  filter(year %in% c('1995','1998','2011','2016', '2017'))

ggplot(data = filter(plot_data, type == 'PG_MTL'), aes(x = date, y = temp, color = depth)) +
  facet_grid(. ~ year, scales="free") + 
  geom_line() +
  geom_point(data = filter(plot_data, type == 'obs'), aes(shape=source)) + 
  theme_bw() + theme(legend.position="none")

image

The next question I'd have on this one is whether this lake actually did change somehow that would impact the hydro/thermodynamics this much? Differences in July observed surface temperatures that differ by ~10°C?? That's pretty crazy to think about what would actually have to happen to create such a difference.

jordansread commented 4 years ago

nhdhr_60087894 is Bear Lake in MI

I didn't see any oddities in the observations (or really the model...) other than missing the hypolimnetic temperature by quite a bit and missing some of the temperatures.

this_site_id <- 'nhdhr_60087894'

mod_data <- feather::read_feather('~/Downloads/bad_outputs/target_nhdhr_60087894_top_source_nhdhr_152335372_outputs.feather') %>% 
  pivot_longer(-index, names_to = 'depth', values_to = 'temp', names_prefix = 'Depth_') %>% filter(!is.na(temp)) %>% 
  mutate(date = as.Date(index), type = 'PG_MTL', source = NA) %>% select(date, depth, temp, type, source)

plot_data <- read_csv('2_prep/in/03_temperature_observations.csv') %>% filter(site_id == this_site_id ) %>% 
  mutate(type = 'obs') %>% select(date, depth, temp, type, source) %>% 
  rbind(mod_data, .) %>% mutate(source = factor(source), depth = factor(depth), year = factor(lubridate::year(date))) %>% 
  filter(year %in% c('2006','2011','2012','2013', '2014'))

ggplot(data = filter(plot_data, type == 'PG_MTL'), aes(x = date, y = temp, color = depth)) +
  facet_grid(. ~ year, scales="free") + 
  geom_line() +
  geom_point(data = filter(plot_data, type == 'obs'), aes(shape=source)) + 
  theme_bw() + theme(legend.position="none")

image

I'd guess the GLM did a touch better than this because it had slightly warmer hypolimnion temperatures, but I haven't checked that.

jordansread commented 4 years ago

Here is GLM (pb0) from that lake ☝️

mod_data <- feather::read_feather('3_run/sync/pb0_nhdhr_60087894_temperatures.feather') %>% 
  select(-ice) %>% 
  pivot_longer(-DateTime, names_to = 'depth', values_to = 'temp', names_prefix = 'temp_') %>% filter(!is.na(temp)) %>% 
  mutate(date = as.Date(DateTime), type = 'PG_MTL', source = NA) %>% select(date, depth, temp, type, source)

plot_data <- read_csv('2_prep/in/03_temperature_observations.csv') %>% filter(site_id == this_site_id ) %>% 
  mutate(type = 'obs') %>% select(date, depth, temp, type, source) %>% 
  rbind(mod_data, .) %>% mutate(source = factor(source), depth = factor(depth), year = factor(lubridate::year(date))) %>% 
  filter(year %in% c('2006','2011','2012','2013', '2014'))

ggplot(data = filter(plot_data, type == 'PG_MTL'), aes(x = date, y = temp, color = depth)) +
  facet_grid(. ~ year, scales="free") + 
  geom_line() +
  geom_point(data = filter(plot_data, type == 'obs'), aes(shape=source)) + 
  theme_bw() + theme(legend.position="none")

image

I wouldn't say it is doing any better...seems to run into the same issues with bias, but it clearly has warmer surface temperatures (which is interesting...)

jordansread commented 4 years ago

nhdhr_82815984 is Hamlin lake, which we already covered in an email. It also has a max depth issue too - seems our models go to 9.5m, but the lake has observations that go to 24m.

How deep should this lake be?

Our issues before include the merging of the two basins and the decent amount of flow that moves through this lake that is unaccounted for.

jordansread commented 4 years ago

is Lake Sallie in MN

this_site_id <- 'nhdhr_120018495'

mod_data <- feather::read_feather('~/Downloads/bad_outputs/target_nhdhr_120018495_top_source_nhdhr_120018211_outputs.feather') %>% 
  pivot_longer(-index, names_to = 'depth', values_to = 'temp', names_prefix = 'Depth_') %>% filter(!is.na(temp)) %>% 
  mutate(date = as.Date(index), type = 'PG_MTL', source = NA) %>% 
  select(date, depth, temp, type, source)

plot_data <- read_csv('2_prep/in/03_temperature_observations.csv') %>% filter(site_id == this_site_id ) %>% 
  mutate(type = 'obs') %>% select(date, depth, temp, type, source) %>% 
  rbind(mod_data, .) %>% mutate(source = factor(source), depth = factor(depth), year = factor(lubridate::year(date))) %>%
  filter(year %in% c('1999','2000','2013','2018'))

ggplot(data = filter(plot_data, type == 'PG_MTL'), aes(x = date, y = temp, color = depth)) +
  facet_grid(. ~ year, scales="free") + 
  geom_line() +
  geom_point(data = filter(plot_data, type == 'obs'), aes(shape=source)) + 
  theme_bw() + theme(legend.position="none")

So, looks like the modeled hypolimnion is much colder than observed.

image

and here is GLM: image

In this dataset of observations it was interesting to me that most of the observations were on meter integers, where I am used to monitoring agencies using feet and us converting which results in a bunch of different numbers... This lake has 3 sources of data, WQP (n = 2439), 7a_temp_coop_munge/tmp/MN_fisheries_all_temp_data_Jan2018.rds (n = 54), and 7a_temp_coop_munge/tmp/MPCA_temp_data_all.rds (n = 8).

I wonder what monitoring agency (or agencies) in the WQP is using meters for this lake? or did we get the units wrong?

aappling-usgs commented 4 years ago

Hamlin Lake depth: the bathy info that @jzwart found (https://www.hamlinlakepreservation.org/History/2010/19/Analytical/Bathymetry/Images/JPG/Depth/HamlinLakeBathymetryUpperHalfLowerLakeSection_30x36_300dpi.jpg) says that the upper basin is 39 feet (11.9 m) and the lower basin is 88 feet (26.8 m). Maybe the model is going to the depth of the upper basin (minus a couple of feet for dry years?)

aappling-usgs commented 4 years ago

For Lake Sallie, I like your lead on checking into the units - if they were measuring in ft but mislabeled them m, that could explain why the observations labeled as very deep have more shallow-looking temperatures.

One possible check would be to look for observations at deeper depths than those present in GLM (or expected based on lake depth) - if there are a lot of those, it seems likely they just mislabeled. But if their depths in "m" go down to right around where we think the bottom of the lake is in m, then we might need a different explanation.

aappling-usgs commented 4 years ago

For Lake Union, I wondered whether the hydrology might have changed substantially, but I don't see much obvious change in the timelapse images available at https://earthengine.google.com/timelapse/ (search for "Lake Union, MN, USA")

image image image image image

(The coolest thing in those photos is the development of the quarry off to the right. I don't think that's related to the lake temp, though)

jordansread commented 4 years ago

@limnoliver How easy would it be to change source on the WQP data from wqp to wqp-{montoring_id} or wqp-{organization_id}? That might help us track down data issues in cases like this (Lake Sallie)

limnoliver commented 4 years ago

Pretty easy I think. source_id tracks that info, and it doesn't get dropped until late in the munge process.

jordansread commented 4 years ago

@limnoliver can you tell me the difference between source_id and source_site_id? I see they are different for coop files, but seem all the same for WQP data.

daily_vals %>% filter(source == 'wqp') %>% head
# A tibble: 6 x 7
  site_id                                      date       depth  temp source source_site_id       source_id           
  <chr>                                        <date>     <dbl> <dbl> <chr>  <chr>                <chr>               
1 nhdhr_{06F93A13-5AFB-4691-8F39-5140F8986BFE} 1993-02-04  0.3    0   wqp    USGS-404744093462700 USGS-404744093462700
2 nhdhr_{06F93A13-5AFB-4691-8F39-5140F8986BFE} 1993-02-04  0.61   0.1 wqp    USGS-404744093462700 USGS-404744093462700
3 nhdhr_{06F93A13-5AFB-4691-8F39-5140F8986BFE} 1993-02-04  1.22   2.9 wqp    USGS-404744093462700 USGS-404744093462700
4 nhdhr_{06F93A13-5AFB-4691-8F39-5140F8986BFE} 1995-01-12  0.3    2.5 wqp    USGS-404744093462700 USGS-404744093462700
5 nhdhr_{0BB28A37-D665-4735-A2DA-5C08014F9EEF} 1983-04-19  0.91   3.2 wqp    EPA_GLNPO-MI06       EPA_GLNPO-MI06      
6 nhdhr_{0BB28A37-D665-4735-A2DA-5C08014F9EEF} 1983-04-20  0.91   3.1 wqp    EPA_GLNPO-MI22       EPA_GLNPO-MI22  
limnoliver commented 4 years ago

For cooperator files, the source_id was the source's unique ID for the lake. source_site_id was the site within the lake, if it was populated. For WQP this is the same since the data from WQP itself does not have sites nested within water bodies, but rather unique site IDs.

jordansread commented 4 years ago

Thanks Sam.

For Lake Sallie, all of the observations in the WQP come from MN PCA, and all come from the same monitoring location (MNPCA-03-0359-00-201) I set a debug line in reduce_temp_data() and did this:

daily_vals %>% filter(source == 'wqp') %>% filter(site_id == 'nhdhr_120018495')
# A tibble: 2,213 x 7
   site_id         date       depth  temp source source_site_id       source_id           
   <chr>           <date>     <dbl> <dbl> <chr>  <chr>                <chr>               
 1 nhdhr_120018495 1980-08-19     0  20.6 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 2 nhdhr_120018495 1980-08-19     1  20.6 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 3 nhdhr_120018495 1980-08-19     2  20.5 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 4 nhdhr_120018495 1980-08-19     3  20.2 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 5 nhdhr_120018495 1980-08-19     4  20.1 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 6 nhdhr_120018495 1980-08-19     5  20.1 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 7 nhdhr_120018495 1980-08-19     6  19.9 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 8 nhdhr_120018495 1980-08-19     7  19.9 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
 9 nhdhr_120018495 1980-08-19     8  19.9 wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
10 nhdhr_120018495 1980-09-06     0  20   wqp    MNPCA-03-0359-00-201 MNPCA-03-0359-00-201
# … with 2,203 more rows

daily_vals %>% filter(source == 'wqp') %>% filter(site_id == 'nhdhr_120018495') %>% pull(source_id) %>% table
.
MNPCA-03-0359-00-201 
                2213 
jordansread commented 4 years ago

Our initial WQP sites suggest we've got two PCA sites within the lake (but perhaps one gets dropped due to data issues)

readRDS('1_crosswalk_fetch/out/canonical_lakes_sf.rds') %>% filter(site_id == 'nhdhr_120018495') %>% plot(reset = FALSE)
readRDS('1_crosswalk_fetch/out/wqp_lake_temperature_sites_sf.rds') %>% plot(add = TRUE, pch = 16, cex = 2, col = 'black')

image

and (ignore resultCount)

readRDS("2_crosswalk_munge/out/wqptemp_nhdhr_xwalk.rds") %>% filter(site_id == 'nhdhr_120018495')
# A tibble: 2 x 3
  site_id         MonitoringLocationIdentifier resultCount
  <chr>           <chr>                              <dbl>
1 nhdhr_120018495 MNPCA-03-0359-00-100                 200
2 nhdhr_120018495 MNPCA-03-0359-00-201                 200

# where are the data?
feather::read_feather('6_wqp_fetch/inout/wqptemp_partitions.feather') %>% filter(site_id == 'nhdhr_120018495')
# A tibble: 2 x 4
  site_id         resultCount MonitoringLocationIdentifier PullTask   
  <chr>                 <dbl> <chr>                        <chr>      
1 nhdhr_120018495         200 MNPCA-03-0359-00-100         wqptemp_123
2 nhdhr_120018495         200 MNPCA-03-0359-00-201         wqptemp_124

This requires the local task table results to run, but:

readRDS('6_wqp_fetch/tmp/wqptemp_123.rds') %>% filter(MonitoringLocationIdentifier == 'MNPCA-03-0359-00-100') %>% 
    dplyr::select(OrganizationIdentifier, `ActivityDepthHeightMeasure/MeasureValue`, `ResultDepthHeightMeasure/MeasureValue`, `ActivityDepthHeightMeasure/MeasureUnitCode`, `ResultDepthHeightMeasure/MeasureUnitCode`, ResultMeasureValue, `ResultMeasure/MeasureUnitCode`, `ActivityStartTime/Time`) %>% pull(`ActivityDepthHeightMeasure/MeasureValue`) %>% table
.
    0  1.52  3.05  4.57   6.1  7.62  9.14 12.19 13.72 14.63 15.24 
   25     1     1     2     2     2     2     2     1     1     1 

These ☝️ are in meters, and the units look right, but they look like they were converted from ft (0, 5', 10', 15', 20', 25', 30', 40', 45', and 48'). This is station -100 though, which is the one that doesn't appear in the final dataset

for the one with more data:

readRDS('6_wqp_fetch/tmp/wqptemp_124.rds') %>% filter(MonitoringLocationIdentifier == 'MNPCA-03-0359-00-201') %>% 
    dplyr::select(OrganizationIdentifier, m_depth = `ActivityDepthHeightMeasure/MeasureValue`, `ResultDepthHeightMeasure/MeasureValue`, `ActivityDepthHeightMeasure/MeasureUnitCode`, `ResultDepthHeightMeasure/MeasureUnitCode`, ResultMeasureValue, `ResultMeasure/MeasureUnitCode`, `ActivityStartTime/Time`) 
# A tibble: 2,455 x 8
   OrganizationIdentif… m_depth `ResultDepthHeightMeasure/Me… `ActivityDepthHeightMeasure/Me… `ResultDepthHeightMeasure/Me… ResultMeasureVal… `ResultMeasure/MeasureU… `ActivityStartTime/…
   <chr>                  <dbl>                         <dbl> <chr>                           <chr>                                     <dbl> <chr>                    <chr>               
 1 MNPCA                      2                            NA m                               NA                                         26.1 deg C                    12:45:00            
 2 MNPCA                      1                            NA m                               NA                                         22   deg C                    13:20:00            
 3 MNPCA                      8                            NA m                               NA                                         20.3 deg C                    13:20:00            
 4 MNPCA                      9                            NA m                               NA                                         20.3 deg C                    13:45:00            
 5 MNPCA                      5                            NA m                               NA                                         24.6 deg C                    13:45:00            
 6 MNPCA                     13                            NA m                               NA                                         19.3 deg C                    13:10:00            
 7 MNPCA                      3                            NA m                               NA                                         18.6 deg C                    15:15:00            
 8 MNPCA                     10                            NA m                               NA                                         20.2 deg C                    10:15:00            
 9 MNPCA                      5                            NA m                               NA                                         22.9 deg C                    09:15:00            
10 MNPCA                      6                            NA m                               NA                                         23.2 deg C                    11:25:00            
# … with 2,445 more rows

These values do not look like they were converted from feet, but they also share about the same depth range as the other site which does seem to have been converted by feet. Suggesting that these are right too (if those are right).

But, how deep is this lake? is it ~60 feet deep as the -201 site's measurements suggest?

readRDS("7_config_merge/out/nml_H_A_values.rds")[['nhdhr_120018495']]$H
 [1] 305.0648 305.3696 305.6744 305.9792 306.2840 306.5888 306.8936 307.1984 307.5032 307.8080 308.1128 308.4176 308.7224 309.0272 309.3320 309.6368 309.9416 310.2464 310.5512 310.8560
[21] 311.1608 311.4656 311.7704 312.0752 312.3800 312.6848 312.9896 313.2944 313.5992 313.9040 314.2088 314.5136 314.8184 315.1232 315.4280 315.7328 316.0376 316.3424 316.6472 316.9520
[41] 317.2568 317.5616 317.8664 318.1712 318.4760 318.7808 319.0856 319.3904 319.6952 320.0000

we've got this as a 15m deep lake in our hypsographic data, which is ~49 ft. This aligns with the max depth measurement in the -100 site, but is 10 ft shallower than the -201 site. But that site only has 13 observations below 15m...

TLDR, I don't see anything obviously wrong about the observation data for this lake.

jordansread commented 4 years ago

For completeness, the only other thought I had was that the clarity value might be way off for this lake, since a clearer lake would have warmer hypo temps. But the Kd value in this set of sims is 0.726

readRDS("7_config_merge/out/nml_Kw_values.rds") %>% filter(site_id == 'nhdhr_120018495')

and this happens to be a lake in our TOHA collection too, and there the average of the GAM timeseries for Kd isn't that far off

feather::read_feather('../lake-temperature-process-models/4_toha/out/pb0_nhdhr_120018495_temperatures_irradiance.feather') %>% pull(extc_coef_0) %>% mean
[1] 0.6571301

I find this lake confusing

jordansread commented 4 years ago

Geez, am having trouble letting this go!

Ok, since this lake (Sallie) is in our time-varying-GLM3-toha collection, I used the same plotting code to look at the GLM3 outputs, which look a lot closer to observed for this lake:

image

these models have lower (clearer) values of Kd in the spring, which could delay stratification set-up and result in warmer hypo temperatures. These values are around 0.4 m-1 for this lake. Or perhaps the physics is different enough in this version of GLM to cause a very different and (in this case) better result. 🤷‍♂️

For completeness, I did go in and check the hyperscales data release vs my local files. The key values (kd and hypso) look similar or the same. Lake depth is definitely the same.

limnoliver commented 4 years ago

Oooh - lots of interesting history on Lake Sallie! TLDR; received sewage from Detroit Lakes in the past, a series of infrastructure upgrades has improved water quality, but still has eutrophication problems. Lots of hydrologic changes, including in 2001 when a lock and dam system was removed, which no longer allows for control of lake levels.

limnoliver commented 4 years ago

Also, lots of littoral habitat, and this paper says: "Well-developed thermal stratification can persist in the deep hole for much of the summer, but in most of the lake the thermocline commonly breaks down".

jordansread commented 4 years ago

is there a thermal profile shown in that paper?

limnoliver commented 4 years ago

No, and it cites a thesis from the 70s that I can't find.