DOI-USGS / lake-temperature-model-prep

Pipeline #1
Other
6 stars 13 forks source link

Missing metadata for highly observed lakes #99

Closed aappling-usgs closed 5 years ago

aappling-usgs commented 5 years ago

We may or may not decide to look into this before the 2019 data release, but I see 8 lakes that have lots of temperature observations and yet lack entries in pb0_config.json and lake_metadata.csv in the current draft of the data release.

My criteria for "lots of temperature observations" is >= 200 dates with >= 5 observations. These 8 lakes meet those criteria but aren't in the metadata files:

[1] "nhdhr_{0BB28A37-D665-4735-A2DA-5C08014F9EEF}"
[2] "nhdhr_{D93A3048-2CD8-4E81-9E4D-44A949E98BBB}"
[3] "nhdhr_{E5735E0B-08B3-4AB3-A7F7-0552DC39D477}"
[4] "nhdhr_114542575"                             
[5] "nhdhr_120019058"                             
[6] "nhdhr_143777120"                             
[7] "nhdhr_156590089"                             
[8] "nhdhr_EBD8B5AF-7EBC-43A3-82BA-AF41830C144C" 
aappling-usgs commented 5 years ago

This issue seems related to #98, but we don't know how closely. What we do know is that one of the lakes that was in the WRR 68 but is not in this new query is Rainy Lake, which is nhdhr_EBD8B5AF-7EBC-43A3-82BA-AF41830C144C (in the new temp obs query, but not the new metadata) and also nhdhr_120019354 (in the Winslow crosswalk). So at least one of the lakes that appears to have been dropped relative to the WRR 68 has actually just acquired a new ID (or been split into 2 IDs?) instead.

jordansread commented 5 years ago

I also made the totally reversible decision to not model any lakes that didn’t have at least one observation of secchi depth. Although I found it was quite rare that lakes had temperature obs but no secchi.

I can look into this one

PS I added lake name to the metadata but haven’t pushed it up yet. That might be helpful when we are trying to figure these issues out.

jordansread commented 5 years ago

One of the issues here is that four of these are great lakes, which are being removed and won't be part of our modeling. We might want to consider excluding these earlier, so that they aren't included in the secchi/temp data at all:

image

nhdhr_114542575, nhdhr_120019058, nhdhr_143777120, nhdhr_EBD8B5AF-7EBC-43A3-82BA-AF41830C144C are not great lakes, but are part of this list of lakes that were not modeled. All four have plenty of secchi data.

nhdhr_114542575, nhdhr_120019058, and nhdhr_EBD8B5AF-7EBC-43A3-82BA-AF41830C144C all have depth and meteo.

nhdhr_143777120 is a sizable reservoir in the Dakotas without depth (or meteo) that we'd probably be including if we pick back up our effort to collate depth/hypso information. image

I looked at the rest of this, and I don't see a reason why those three lakes weren't modeled (I understand why the 4 great lakes polys + the Dakota reservoir weren't).

Now moving into https://github.com/USGS-R/lake-temperature-process-models to see why, since it seems those three have everything needed...

jordansread commented 5 years ago

Ok - I think we can close this one.

I join all of the variables needed for the nml files and get 6922 lakes that can be modeled, including having secchi data. but, then I filter out all lakes that don't have an actual file for meteo data (meaning they have specified the file name for that file, but that file simply doesn't exist). After that filtering, we have 6420 remaining lakes that can be modeled, and out of the lakes lost, these three are part of that lot. The files are "NLDAS_time[0.351500]_x[278]_y[182].csv" "NLDAS_time[0.351500]_x[260]_y[187].csv" "NLDAS_time[0.351500]_x[254]_y[189].csv"

I am not sure why these files don't exist - they probably simply weren't part of the collection of lakes that were used to create the original cells from the .nc bounding boxes, so they never got generated. If that is the case, they (and probably the majority of the other ones that got lost) would be fairly easy to add back once I am at my desk (the .nc files are on an external drive). ~or they are failed cells for some reason, such as a masked part of NLDAS (seems unlikely).~ Update seems these were just lakes that weren't in the original "2_crosswalk_munge/out/lakes_sf.rds" used to create the cells. See scipiper::scmake('nldas_cells', remake_file = '6_drivers_fetch.yml') to see that these are missing.