DOI-USGS / mda.streams

backend tools for powstreams
Creative Commons Zero v1.0 Universal
3 stars 8 forks source link

does stage_nwis_sitelist return too many sites? #239

Open aappling-usgs opened 8 years ago

aappling-usgs commented 8 years ago

going through tests in test-stage.R.

this used to return 22 sites (back in summer 2015) but now returns 1118.

stage_nwis_sitelist(c('doobs'), 'WI')

that would be ok if it were simply that WI has added a bunch of doobs sites. but the first three sites (and probably others) come back with nothing...

> staged <- stage_nwis_ts(sites[1:3], var="doobs", times=c('2006-01-01', '2016-04-05'), folder=folder)
> staged
NULL

also, a query by browser returns just 27 sites: http://waterdata.usgs.gov/nwis/uv?state_cd=wi&index_pmcode_00300=1&format=station_list&group_key=NONE&range_selection=days&period=7&begin_date=2007-01-01&end_date=2016-04-10&date_format=YYYY-MM-DD&rdb_compression=file&list_of_search_criteria=state_cd%2Crealtime_parameter_selection

this makes me suspect that stage_nwis_sitelist is not being sufficiently selective.

jordansread commented 8 years ago

The code I am working on for this is much simpler than the original code, and passes the smell test for new sites vs old sites. We should discuss it, but currently was created to sit in stream_metab_usa instead of this package.

aappling-usgs commented 8 years ago

ok. why the repo switch? looking forward to seeing/discussing the improved version

aappling-usgs commented 8 years ago

i do think this ultimately belongs in mda.streams. i won't change stream_metab_usa for now, but i did just copy/modify get_nwis_sites and init_site_list from @jread-usgs's fork of stream_metab_usa into mda.streams as retrieve_nwis_sitelist and stage_nwis_sitelist, respectively. slight differences in interface, ability to query for intersection of several p_codes, should be same underlying logic...

...except it's not. the old mda.streams:::stage_nwis_sitelist runs a single query for all p_codes (though just one state) at a time, then selects sites that have all p_codes. this should still be faster than the multi-query version I just introduced. it also filters to only those results with begin_date < end_date. i wonder how that problem case was arising and whether it still does.

see https://github.com/USGS-R/mda.streams/compare/develop...aappling-usgs:stage_nwis_sitelist?expand=1 for the comparison of new (green) to old (red) logic.

i'm going to keep the changes in a separate branch for now because i'm not sure if we'll want to make them.

aappling-usgs commented 8 years ago

on second look, i think the dates test is to make sure that if multiple pcodes are requested, their date ranges actually overlap. seems reasonable.

that said, neither the multiple p_codes nor the overlapping dates is important to our current need for stream_metab_usa, which is just to find sites that have >100 doobs days. so this issue is not a high priority this week (or possibly ever).