DOI-USGS / mntoha-data-release

Thermal Optical Habitat Area data release
Other
1 stars 4 forks source link

Support state-id downloads for the data release #14

Closed jordansread closed 3 years ago

jordansread commented 4 years ago

from GH:

I am hoping you can help me be a better collaborator and better user of the temperature data. If I need to pull surface temperatures for a given DOW/date (such as the attached list), what is the best way for me to do that? Do you mind sharing your code for doing this? My current protocol would be to download/save the whole data release and then extract what I needed, but that doesn't seem like the most efficient way.

my response in email so far:

unfortunately, I don’t think there is a simple way for you to do this right now, and that is an oversight on the way the data release is set up.

Everything is keyed off of the lake identifier (site_id in the release) and the group_id, which tells you which file(s) you need to download when files in the sciencebase release are large and therefore separated into several zip files (e.g., the raw daily temperature predictions). That information, as well as the lake name (if it appears in NHD’s GNIS_name) is in the lake_metadata.csv.

But the problem that our set-up doesn’t solve is the connection to the state identifiers. You’d need to know which NHDID (site_id) you want for each MNDOW, and use that to figure out which zip groups to download, unzip, and then which .csv within that file to read. It is probably frustrating to hear that we do have those crosswalks in our pipeline for processing the data release, but they aren’t available in the release

readRDS('2_crosswalk_munge/out/mndow_nhdhr_xwalk.rds') %>% head
# A tibble: 6 x 2
  MNDOW_ID       site_id       
  <chr>          <chr>         
1 mndow_01000100 nhdhr_86445259
2 mndow_01000200 nhdhr_86444581
3 mndow_01000300 nhdhr_59745953
4 mndow_01000400 nhdhr_59745969
5 mndow_01000500 nhdhr_80006995
6 mndow_01000600 nhdhr_80006835
readRDS('2_crosswalk_munge/out/winslow_nhdhr_xwalk.rds') %>% head
# A tibble: 6 x 2
  WINSLOW_ID   site_id        
  <fct>        <chr>          
1 nhd_10595596 nhdhr_120019424
2 nhd_10595598 nhdhr_32671150 
3 nhd_10595604 nhdhr_32671198 
4 nhd_10595608 nhdhr_32671162 
5 nhd_10595614 nhdhr_32671214 
6 nhd_10595644 nhdhr_32671274

And here are all of the cross walks we have currently:

      - 2_crosswalk_munge/out/gnisname_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/lagosne_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/mglp_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/wbic_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/micorps_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/mndow_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/winslow_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/ndgf_nhdhr_xwalk.rds.ind
      - 2_crosswalk_munge/out/iadnr_nhdhr_xwalk.rds.ind

I think we need to do two things to support your workflow: 1) add a crosswalk table to the data release, and 2) provide some example code on how to use that to download and access the data you need for MNDOWs

jordansread commented 3 years ago

At a minimum, we should include the DOWs in the lake metadata file...