DOI-USGS / mda.streams

backend tools for powstreams
Creative Commons Zero v1.0 Universal
3 stars 8 forks source link

do a site-specific NA filter in stage_nwis_ts? #248

Closed jordansread closed 8 years ago

jordansread commented 8 years ago

working through the make for nwis data, I am getting failures for varify_ts. These seem to be because of a small number of NA values in the returned data. here is one example:

 stage_nwis_ts("nwis_01036390", var='doobs', times=c('2006-10-01', '2016-03-01'), version='rds', verbose=TRUE)

tail(site_data)
                 DateTime     doobs
U                         mgO2 L^-1
49581 2013-09-30 23:00:00       9.9
49582 2013-10-01 00:00:00       9.9
49583 2013-10-01 01:00:00       9.9
49584 2013-10-01 02:00:00       9.9
49585 2013-10-01 03:00:00       9.9
49586 2013-10-01 04:00:00        NA

we don't filter out NAs, so are we expecting no NAs to be returned from dataRetieval?

jordansread commented 8 years ago

Tried this out on a much earlier dataRet version, and the same NA value is returned, so it is not a change there that introduced this

aappling-usgs commented 8 years ago

Huh - I tried running that one line of code and hit entirely different problems before even getting to verify_ts:

> library(mda.streams)
Setting endpoint to www.sciencebase.gov
> st=stage_nwis_ts("nwis_01036390", var='doobs', times=c('2006-10-01', '2016-03-01'), version='rds', verbose=TRUE)
requesting data from NWIS for p_code 00300
writing the downloaded data to file
 Hide Traceback

 Rerun with Debug
 Error in UseMethod("as.lazy") : 
  no applicable method for 'as.lazy' applied to an object of class "factor" 
12 FUN(X[[i]], ...) 
11 lapply(x, as.lazy, env = env) 
10 structure(lapply(x, as.lazy, env = env), class = "lazy_dots") 
9 as.lazy_dots.list(list(...)) 
8 as.lazy_dots(list(...)) 
7 lazyeval::all_dots(.dots, ...) 
6 select_.data.frame(site_data, "DateTime", which.var) 
5 select_(site_data, "DateTime", which.var) 
4 eval(expr, envir, enclos) 
3 eval(lhs, parent, parent) 
2 select_(site_data, "DateTime", which.var) %>% filter(DateTime >= 
    truetimes[1] & DateTime < truetimes[2]) %>% u(c(NA, var_units)) %>% 
    setNames(c("DateTime", var)) at stage_nwis_ts.R#135
1 stage_nwis_ts("nwis_01036390", var = "doobs", times = c("2006-10-01", 
    "2016-03-01"), version = "rds", verbose = TRUE) 
> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mda.streams_0.9.5

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3             rLakeAnalyzer_1.7.6     magrittr_1.5            devtools_1.9.1          sbtools_0.18.0         
 [6] unitted_0.2.5           lattice_0.20-33         R6_2.1.1                stringr_1.0.0           httr_1.1.0             
[11] plyr_1.8.3              dplyr_0.4.3             tools_3.2.4             grid_3.2.4              parallel_3.2.4         
[16] DBI_0.3.1               streamMetabolizer_0.9.1 digest_0.6.9            dataRetrieval_2.4.9     lazyeval_0.1.10        
[21] assertthat_0.1          readr_0.2.2             reshape2_1.4.1          LakeMetabolizer_1.4.1   tidyr_0.2.0            
[26] bitops_1.0-6            curl_0.9.4              RCurl_1.95-4.7          memoise_0.2.1           geoknife_1.2.1         
[31] sp_1.2-0                stringi_1.0-1           XML_3.98-1.3            jsonlite_0.9.19         lubridate_1.5.0        
[36] foreign_0.8-66  

maybe i've forgotten to update something on my home computer (just did sbtools and mda.streams)?

aappling-usgs commented 8 years ago

I did indeed add a more stringent check for NAs in verify_ts two weeks ago: https://github.com/USGS-R/mda.streams/blame/develop/R/verify_ts.R#L52. i can't remember exactly why right now, but it seemed like a good idea at the time.

maybe that was when i was working with jud's data and hitting errors down the line because other code (maybe e.g., in combine_ts) wasn't expecting NAs? i wish i could remember. it was part of this large PR: https://github.com/USGS-R/mda.streams/pull/232.

i can't think of any reason why removing NAs from NWIS data would be a bad idea. so i think that the change you've made is a good one (eliminating NAs from data we store on SB).

aappling-usgs commented 8 years ago

@jread-usgs - all this now behaves as I'd expect. think we can close this issue, or is there any more I can do to resolve it?

> st=stage_nwis_ts("nwis_01036390", var='doobs', times=c('2006-10-01', '2016-03-01'), version='rds', verbose=TRUE)
requesting data from NWIS for p_code 00300
writing the downloaded data to file
> site_data <- read_ts(st)
> tail(site_data)
                 DateTime     doobs
U                         mgO2 L^-1
49580 2013-09-30 22:00:00       9.9
49581 2013-09-30 23:00:00       9.9
49582 2013-10-01 00:00:00       9.9
49583 2013-10-01 01:00:00       9.9
49584 2013-10-01 02:00:00       9.9
49585 2013-10-01 03:00:00       9.9