DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
256 stars 85 forks source link

Inventorying NWIS data for a specific date range #665

Closed lekoenig closed 1 year ago

lekoenig commented 1 year ago

What is your question? I'm currently using the function dataRetrieval::whatNWISdata to inventory what NWIS data is available for a given parameter. My question is whether there is a way to inventory what NWIS data exists for a site over a given date range?

To Reproduce Here is an example for Muddy Creek above Olson Draw, near Dad, WY, which has over 4,000 observation-days of specific conductance data but no data during the period I happen to be interested in (2022-08-30 to 2022-09-30). It would be ideal to know what data exists within this date range without downloading the data first, but I understand that may be tricky to implement in practice.

library(dataRetrieval)
dataRetrieval::whatNWISdata(siteNumber = "09258050", service = "uv", parameterCd = "00095")
#>    agency_cd  site_no                                 station_nm site_tp_cd dec_lat_va dec_long_va coord_acy_cd                                                      
#> 91      USGS 09258050 Muddy Creek above Olson Draw, near Dad, WY         ST   41.47833   -107.6025            R
#>    dec_coord_datum_cd alt_va alt_acy_va alt_datum_cd   huc_cd data_type_cd parm_cd stat_cd  ts_id loc_web_ds medium_grp_cd
#> 91              NAD83   6835         10       NGVD29 14050004           uv   00095    <NA> 162824         NA           wat
#>    parm_grp_cd  srs_id access_cd begin_date   end_date count_nu
#> 91        <NA> 1646694         0 2010-07-23 2023-03-01     4604

dataRetrieval::readNWISuv(site = "09258050", parameterCd = "00095", startDate = "2022-08-30", endDate = "2022-09-30")
#> [1] agency_cd site_no   dateTime  tz_cd    
#> <0 rows> (or 0-length row.names)

Session Info Please include your session info:

sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.1   stringr_1.5.0   dplyr_1.1.0     purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.1.8   
 [8] ggplot2_3.4.0   tidyverse_1.3.1 targets_0.12.1 

loaded via a namespace (and not attached):
 [1] fs_1.5.2             usethis_2.1.5        sf_1.0-9             lubridate_1.9.2      devtools_2.4.3       bit64_4.0.5         
 [7] httr_1.4.4           rprojroot_2.0.2      tools_4.1.3          backports_1.4.1      utf8_1.2.3           R6_2.5.1            
[13] KernSmooth_2.23-20   DBI_1.1.3            colorspace_2.0-3     withr_2.5.0          tidyselect_1.2.0     prettyunits_1.1.1   
[19] processx_3.5.3       bit_4.0.5            curl_5.0.0           compiler_4.1.3       cli_3.6.0            rvest_1.0.2         
[25] xml2_1.3.3           desc_1.4.1           dataRetrieval_2.7.12 stringfish_0.15.7    scales_1.2.1         classInt_0.4-8      
[31] callr_3.7.0          proxy_0.4-27         digest_0.6.29        pkgconfig_2.0.3      sessioninfo_1.2.2    dbplyr_2.1.1        
[37] fastmap_1.1.0        rlang_1.0.6          readxl_1.4.0         rstudioapi_0.13      generics_0.1.3       RApiSerialize_0.1.2 
[43] jsonlite_1.8.4       vroom_1.6.1          magrittr_2.0.3       Rcpp_1.0.10          munsell_0.5.0        fansi_1.0.4         
[49] lifecycle_1.0.3      stringi_1.7.12       yaml_2.3.5           brio_1.1.3           pkgbuild_1.3.1       grid_4.1.3          
[55] parallel_4.1.3       crayon_1.5.2         haven_2.4.3          hms_1.1.2            knitr_1.38           ps_1.6.0            
[61] pillar_1.8.1         igraph_1.2.11        base64url_1.4        codetools_0.2-18     pkgload_1.2.4        reprex_2.0.1        
[67] glue_1.6.2           remotes_2.4.2        data.table_1.14.6    RcppParallel_5.1.5   modelr_0.1.8         vctrs_0.5.2         
[73] tzdb_0.3.0           testthat_3.1.3       cellranger_1.1.0     gtable_0.3.0         qs_0.25.4            assertthat_0.2.1    
[79] cachem_1.0.6         xfun_0.30            broom_0.7.12         e1071_1.7-13         class_7.3-20         memoise_2.0.1       
[85] units_0.8-1          timechange_0.2.0     ellipsis_0.3.2
ldecicco-USGS commented 1 year ago

Unfortunately currently there's not a way to do that via web services (that I'm aware at least!). You can either get the output from whatNWISdata which only covers the overall period of record, or you pull the data in the date range you are interested and just find out then. (so exactly what you've done above)

I (and others) have asked for an option like you are requesting but have always been told that there are too big of technical hurdles to make that type of a flexible query feasible. However, I think that @jkreft-usgs is working on a more flexible query tool that might make this kind of a query possible in the future.

lekoenig commented 1 year ago

Great, thanks for your quick response @ldecicco-USGS! I figured this was the case but wanted to make sure I wasn't missing anything (and potentially boost the request for this feature if that's helpful).

jkreft-usgs commented 1 year ago

There is an API that is currently in development that will hopefully be able to support this at a year-level roll-up. (basically the WQP summary service but for all our data)