Closed lindsayplatt closed 2 years ago
I'd like to draw @jread-usgs and @padilla410 's attention to this issue.
Thanks, @lindsayplatt for documenting this. Just wanted to note that this only includes Minnesota lakes.
Is this 17 lakes out of ~3k+? Or 17 out of the 881 TOHA lakes in MN?
@lindsayplatt do you think we lost these when going to LAGOS US (from LAGOS NE)? Alternatively there is a chance we lost some in the poly-to-poly matching with the changes to PROJ/sf that we've noted elsewhere. But I see all of these 17 have matches to MNDOWs, so unlikely a crosswalking issue.
Previously, when I subset the 14,678 MN lakes in our pipeline ('2_crosswalk_munge/out/centroid_lakes_sf.rds'
filtered to MN lakes using '2_crosswalk_munge/out/lake_to_state_xwalk.rds'
) down to those that are in the nml list ('7_config_merge/out/nml_list.rds'
), I ended up with 3,583 lakes. I now end up with 3,569 lakes. Per Lindsay's notes above, there are 3 lakes in the new subset that were not in the old subset (created with an earlier version of the nml list), and 17 lakes that were in the old subset that are not in the new subset.
arrow::read_feather(scipiper::sc_retrieve('7b_temp_merge/out/temp_data_with_sources.feather.ind')) %>%
+ filter(site_id %in% dropped_sites) %>% group_by(site_id) %>% tally %>% arrange(desc(n))
# A tibble: 17 × 2
site_id n
<chr> <int>
1 nhdhr_107071284 551
2 nhdhr_{83CB1E97-A4A4-4F19-8527-0DCCB33EA81F} 183
3 nhdhr_91688597 175
4 nhdhr_{E0B4C1E3-BF0A-485B-8BAA-16414B0FDA18} 159
5 nhdhr_120020396 131
6 nhdhr_107071188 110
7 nhdhr_91692063 88
8 nhdhr_{507508B4-D0CB-4DB4-A9AD-D0E45CDC6646} 70
9 nhdhr_{74141881-3AF2-48DD-80A2-68A589A99B21} 45
10 nhdhr_126210325 33
11 nhdhr_{09E09633-2D9C-40F7-9F89-36ED6CF2A394} 23
12 nhdhr_{503D7EB2-93D3-47F8-8613-5482F66B5C8A} 14
13 nhdhr_{3CB73C80-7D61-484A-9B02-D881B3B366E6} 7
14 nhdhr_{63D127FC-DD2D-453B-B57F-45B323A72C51} 6
15 nhdhr_117649999 6
16 nhdhr_120019863 3
17 nhdhr_86443975 1
☝️ the number of temperature observations each of these sites has
Some more thoughts: I think perhaps the 11 sites (probably more beyond MN) were lost with the shift from LAGOS NE to LAGOS US. We went from a poly-to-poly LAGOS match to a direct nhdHR PERM_ID match when moving to US. So there is a potential that our poly match was greedier and some of the false matches were dropped. There also may be depths that were in LAGOS NE but were removed in the updated US version due to QAQC issues.
I am less certain why we'd lose 6 MN lakes that had clarity estimates for but no longer do. Most of the Kw/Kd/Secchi data is coming from the WQP and each time we pull there are some lost measurements but mostly new gained ones. Sometimes the organizations remove errant or redundant data or remove monitoring locations completely from STORET/WQP. This is a major challenge for reproducibility, but isn't something we have control over (and it is nearly always a net benefit to update the data as opposed to lock in a WQP pull from a few years ago to avoid this issue). WQP was last pulled about 6 months ago, so we do have the option of doing an updated pull sometime soon before the final model runs. The downside is that some of the Kw values would changes for some of the already-modeled MN lakes in the projections. Something to consider carefully (although if the GCM pipeline is careful at the parameter stage, it would only re-run lakes with updated nml params vs all of them).
Additionally, because the nml_lake_depth
target is calculated from the nml_H_A
target, and the nml_H_A
target is calculated by a combination of hypsography and z_max values, it is likely that the 11 lakes lost that Lindsay shared above are lost at that target because of no longer having z_max (just wanted to clarify that because the order of the targets might suggest otherwise).
Thanks for these explanations, @jread-usgs. So what I am hearing is that these losses are likely not due to our error in any of the recent PRs, but instead likely an artifact of using LAGOS-US now (which does seem reasonable). I am personally OK with not investigating this issue further and being comfortable with our new set being the best available and moving forward. I also would like to skip a new clarity pull since we have a lot of other things going on (even though it may be fairly simple).
Hayley pulled down the recent
nml_list.rds
by runningscmake('7_config_merge/out/nml_list.rds', 'getters.yml')
on 5/10/2022 and found that there are 17 sites that used to be in the list and are no longer there (note that there are 3 that were added). I spent a bit of time this morning seeing what could be the issue and my sleuthing identified a couple of spots where there were semi-recent changes that could explain this difference. The main issues might be coming from changes in7_config_merge/out/nml_H_A_values.rds.ind
(here is a link to the lines with recent changes).Main question: should we be concerned that 17 sites are no longer included in the NML list? I would expect that our recent efforts would have only added, not subtracted.