DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
260 stars 84 forks source link

Guidance on requesting multiple states/territories in dataRetrieval #661

Open ehinman opened 1 year ago

ehinman commented 1 year ago

What is your question? A clear and concise description of the question. I am unsure if this is a bug report, a feature request, or my own misunderstanding, so I am posting as a question! When I try to query data for multiple states using readWQPdata and readWQPsummary in the format statecode = c("AK", "AL") or statecode = c("01","02"), the web service does not work. It will work for a single state abbrev or code (e.g. statecode = "AK"), but will only accept multiple if I provide it with the format statecode = c("US:01", "US:02"). Am I missing something here or is this the intended result? It is confusing that one abbrev works but two does not. To Reproduce If possible, narrow down the question to a dataRetrieval query:

library(dataRetrieval)
test = dataRetrieval::readWQPsummary(characteristicName = "Escherichia coli", statecode = "01") # completes
test1 = dataRetrieval::readWQPsummary(characteristicName = "Escherichia coli", statecode = "AL") # completes
test2 = dataRetrieval::readWQPsummary(characteristicName = "Escherichia coli", statecode = c("AL", "AK")) # fails
test3 = dataRetrieval::readWQPsummary(characteristicName = "Escherichia coli", statecode = c("US:01", "US:02")) # completes

Expected behavior I would expect the function to be able to handle multiple state abbreviations or state codes with the same input flexibility as it has for handling one state abbreviation or state code.

Screenshots If applicable, add screenshots to help explain your problem.

Session Info Please include your session info:

sessionInfo()
#OR preferred:
devtools::session_info()
─ Session info ─────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 22000)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2023-02-01
 rstudio  2022.07.1+554 Spotted Wakerobin (desktop)
 pandoc   NA

─ Packages ─────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.2.2)
 bit             4.0.5   2022-11-15 [1] CRAN (R 4.2.2)
 bit64           4.0.5   2020-08-30 [1] CRAN (R 4.2.2)
 cachem          1.0.6   2021-08-19 [1] CRAN (R 4.2.2)
 callr           3.7.3   2022-11-02 [1] CRAN (R 4.2.2)
 class           7.3-20  2022-01-16 [1] CRAN (R 4.2.1)
 classInt        0.4-8   2022-09-29 [1] CRAN (R 4.2.2)
 cli             3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
 crayon          1.5.2   2022-09-29 [1] CRAN (R 4.2.2)
 curl            5.0.0   2023-01-12 [1] CRAN (R 4.2.2)
 data.table      1.14.6  2022-11-16 [1] CRAN (R 4.2.2)
 dataRetrieval   2.7.12  2023-01-17 [1] Github (USGS-R/dataRetrieval@e90c02c)
 DBI             1.1.3   2022-06-18 [1] CRAN (R 4.2.2)
 devtools        2.4.5   2022-10-11 [1] CRAN (R 4.2.2)
 digest          0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
 dplyr           1.0.10  2022-09-01 [1] CRAN (R 4.2.2)
 e1071           1.7-12  2022-10-24 [1] CRAN (R 4.2.2)
 ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.2.2)
 fansi           1.0.3   2022-03-24 [1] CRAN (R 4.2.2)
 fastmap         1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
 fs              1.5.2   2021-12-08 [1] CRAN (R 4.2.2)
 generics        0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
 glue            1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
 hms             1.1.2   2022-08-19 [1] CRAN (R 4.2.2)
 htmltools       0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
 htmlwidgets     1.6.1   2023-01-07 [1] CRAN (R 4.2.2)
 httpuv          1.6.8   2023-01-12 [1] CRAN (R 4.2.2)
 httr            1.4.4   2022-08-17 [1] CRAN (R 4.2.2)
 KernSmooth      2.23-20 2021-05-03 [1] CRAN (R 4.2.1)
 later           1.3.0   2021-08-18 [1] CRAN (R 4.2.2)
 lifecycle       1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
 memoise         2.0.1   2021-11-26 [1] CRAN (R 4.2.2)
 mime            0.12    2021-09-28 [1] CRAN (R 4.2.0)
 miniUI          0.1.1.1 2018-05-18 [1] CRAN (R 4.2.2)
 pillar          1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
 pkgbuild        1.4.0   2022-11-27 [1] CRAN (R 4.2.2)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
 pkgload         1.3.2   2022-11-16 [1] CRAN (R 4.2.2)
 prettyunits     1.1.1   2020-01-24 [1] CRAN (R 4.2.2)
 processx        3.8.0   2022-10-26 [1] CRAN (R 4.2.2)
 profvis         0.3.7   2020-11-02 [1] CRAN (R 4.2.2)
 promises        1.2.0.1 2021-02-11 [1] CRAN (R 4.2.2)
 proxy           0.4-27  2022-06-09 [1] CRAN (R 4.2.2)
 ps              1.7.2   2022-10-26 [1] CRAN (R 4.2.2)
 purrr           1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
 R6              2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
 Rcpp            1.0.9   2022-07-08 [1] CRAN (R 4.2.2)
 readr           2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
 remotes         2.4.2   2021-11-30 [1] CRAN (R 4.2.2)
 rlang           1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
 rstudioapi      0.14    2022-08-22 [1] CRAN (R 4.2.2)
 sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
 sf              1.0-9   2022-11-08 [1] CRAN (R 4.2.2)
 shiny           1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
 stringi         1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
 stringr         1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
 TADA            0.0.1   2023-01-30 [1] Github (USEPA/TADA@5d66a09)
 tibble          3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
 tidyselect      1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
 tzdb            0.3.0   2022-03-28 [1] CRAN (R 4.2.2)
 units           0.8-1   2022-12-10 [1] CRAN (R 4.2.2)
 urlchecker      1.0.1   2021-11-30 [1] CRAN (R 4.2.2)
 usethis         2.1.6   2022-05-25 [1] CRAN (R 4.2.2)
 utf8            1.2.2   2021-07-24 [1] CRAN (R 4.2.2)
 vctrs           0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
 vroom           1.6.0   2022-09-30 [1] CRAN (R 4.2.2)
 withr           2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
 xtable          1.8-4   2019-04-21 [1] CRAN (R 4.2.2)

 [1] C:/Program Files/R/R-4.2.1/library

────────────────────────────────────────────────────────────────

Additional context Add any other context about the problem here.

ldecicco-USGS commented 1 year ago

For now, I would stick to a state-by-state query, as described here: https://rconnect.usgs.gov/dataRetrieval/articles/wqp_large_pull_script.html or the "data pipeline" approach described here: https://rconnect.usgs.gov/dataRetrieval/articles/wqp_large_pull_targets.html

This is a similar problem to #658 in that the WQP behaves different from NWIS. NWIS only allows 1 state code, WQP allows multiple. That being said...I have tried multistate queries that have timed out, so splitting by state is a pretty straightforward way to minimize the potential of timeouts.

That being said...when I get the update for the WQP state/county codes - which should be coming in the next couple of days, multi-state queries in WQP will be allowed (...not sure if I'd recommend it though! For the summaries it might be fine, but when actually pulling the data, it could be very slow).