USGS-R / delaware-model-prep

Data and scripts for collecting and formatting data in the Delaware River Basin in prep for ML and DA modeling
2 stars 13 forks source link

Change in # obs included in in `9_collaborator_data/res/res_io_obs.feather`? #106

Closed hcorson-dosch-usgs closed 3 years ago

hcorson-dosch-usgs commented 3 years ago

While updating the file names and re-testing the data prep code in the res-temperature-process-models repo to reflect changes made to file types in the drb-temp-data-release SB, I noticed that the reservoir inflow outflow observation csv (reservoir_io_obs.csv, sourced from delaware-model-prep/9_collaborator_data/res/res_io_obs.feather) currently has 52,971 observations and no rows where 'location' == 'inflow', when previoulsy the file had 203,981 observations and values of 'outflow' and 'inflow' for 'location'

limnoliver commented 3 years ago

I've narrowed it to the target res_inflow_ids which uses NLDI to find inflow sites, and is returned empty for me. I made sure I had the latest version of dataRetrieval and built it again, but it's still empty. Will follow up more tomorrow.

limnoliver commented 3 years ago

@aappling-usgs -- could you rebuild res_inflow_ids and see what you get? If you get a non-empty list, maybe commit or cache this object so we don't have to rebuild? Currently, everyone has to build it, and I bet most will miss that you need the latest version of dataRetrieval.

You could PR my open PR here as I've not yet merged changes after the data release review.

hcorson-dosch-usgs commented 3 years ago

Fixed by Sam - it was a change to how NLDI is returning site names:

NLDI is returning sites with "USGS-siteno" and we were either filtering that out of the temperature sites names from the data, or did not have it to begin with the flow sites. Now we're trying to match sites returned from NLDI that begin with "USGS-" to pure site numbers. I suspect this is a change to how NLDI is returning site names, though I'm not sure. I don't think I changed that coding in the flow or temp data.

Sam had to fix the same "USGS-" match issue twice, then the res_inflow_ids target and res_io_obs.feather built correctly.

The new file has more obs than the original (n = 205191 as compared to n=203,981), but after a bit of digging Sam and I concluded that it's due only to a later pull date. Max_data in my version of the feather file was 12/17/2020, while the max date in the new file is 04/21/2021. We also checked # flow and temp obs for each site, and those only increased or stayed the same.

The updated res_io_obs.csv is up on SB

hcorson-dosch-usgs commented 3 years ago

Sam's fix is in #107, here