DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
256 stars 85 forks source link

readWQPsummary does not accept statecode "74" or "UM" #658

Closed ehinman closed 9 months ago

ehinman commented 1 year ago

Describe the bug EPA's draft TADABigdataRetrieval() chunks large data calls using dataRetrieval by statecode. When readWQPsummary tries to download data using statecode "74" or "UM" (U.S. Minor Outlying Islands), it returns an error.

To Reproduce Steps to reproduce the behavior: test <- dataRetrieval::readWQPsummary(statecode = "UM", siteType = "Stream") test <- dataRetrieval::readWQPsummary(statecode = "74", siteType = "Stream")

library(dataRetrieval)
test = dataRetrieval::readWQPsummary(statecode = "74", siteType = "Stream")
Request failed [400]. Retrying in 1.3 seconds...
Request failed [400]. Retrying in 1 seconds...
Bad Request (HTTP 400).299 WQP "The value of statecode=US:NA must match the format (?:([A-Z]{2}):)?([0-9]{1,2})"
Error in attr(retval, "queryTime") <- Sys.time() : 
  attempt to set an attribute on NULL

Expected behavior I expect to receive either an empty dataRetrieval dataframe named test if no data are present, or a data.frame called test that contains 19 columns summarizing the water quality parameter data found at each site in each applicable year within that state/territory.

Screenshots If applicable, add screenshots to help explain your problem.

Session Info Please include your session info:

sessionInfo()
#OR preferred:
devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 22000)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2023-01-23
 rstudio  2022.07.1+554 Spotted Wakerobin (desktop)
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ────────────────────────────────────────────────────────────────────────────────
 ! package       * version date (UTC) lib source
   assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.2.2)
   backports       1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
   bit             4.0.5   2022-11-15 [1] CRAN (R 4.2.2)
   bit64           4.0.5   2020-08-30 [1] CRAN (R 4.2.2)
   brio            1.1.3   2021-11-30 [1] CRAN (R 4.2.2)
   broom           1.0.2   2022-12-15 [1] CRAN (R 4.2.2)
   cachem          1.0.6   2021-08-19 [1] CRAN (R 4.2.2)
   callr           3.7.3   2022-11-02 [1] CRAN (R 4.2.2)
   cellranger      1.1.0   2016-07-27 [1] CRAN (R 4.2.2)
   class           7.3-20  2022-01-16 [1] CRAN (R 4.2.1)
   classInt        0.4-8   2022-09-29 [1] CRAN (R 4.2.2)
   cli             3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
   colorspace      2.0-3   2022-02-21 [1] CRAN (R 4.2.2)
   crayon          1.5.2   2022-09-29 [1] CRAN (R 4.2.2)
   curl            5.0.0   2023-01-12 [1] CRAN (R 4.2.2)
   data.table    * 1.14.6  2022-11-16 [1] CRAN (R 4.2.2)
   dataRetrieval * 2.7.12  2023-01-17 [1] Github (USGS-R/dataRetrieval@e90c02c)
   DBI             1.1.3   2022-06-18 [1] CRAN (R 4.2.2)
   dbplyr          2.3.0   2023-01-16 [1] CRAN (R 4.2.2)
   desc            1.4.2   2022-09-08 [1] CRAN (R 4.2.2)
   devtools      * 2.4.5   2022-10-11 [1] CRAN (R 4.2.2)
   digest          0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
   dplyr         * 1.0.10  2022-09-01 [1] CRAN (R 4.2.2)
   e1071           1.7-12  2022-10-24 [1] CRAN (R 4.2.2)
   ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.2.2)
   evaluate        0.19    2022-12-13 [1] CRAN (R 4.2.2)
   fansi           1.0.3   2022-03-24 [1] CRAN (R 4.2.2)
   farver          2.1.1   2022-07-06 [1] CRAN (R 4.2.2)
   fastmap         1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
   forcats       * 0.5.2   2022-08-19 [1] CRAN (R 4.2.2)
   fs              1.5.2   2021-12-08 [1] CRAN (R 4.2.2)
   gargle          1.2.1   2022-09-08 [1] CRAN (R 4.2.2)
   generics        0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
   gganimate       1.0.8   2022-09-08 [1] CRAN (R 4.2.2)
   ggplot2       * 3.4.0   2022-11-04 [1] CRAN (R 4.2.2)
   gifski          1.6.6-1 2022-04-05 [1] CRAN (R 4.2.2)
   glue            1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
   googledrive     2.0.0   2021-07-08 [1] CRAN (R 4.2.2)
   googlesheets4   1.0.1   2022-08-13 [1] CRAN (R 4.2.2)
   gtable          0.3.1   2022-09-01 [1] CRAN (R 4.2.2)
   haven           2.5.1   2022-08-22 [1] CRAN (R 4.2.2)
   hms             1.1.2   2022-08-19 [1] CRAN (R 4.2.2)
   htmltools       0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
   htmlwidgets     1.6.1   2023-01-07 [1] CRAN (R 4.2.2)
   httpuv          1.6.8   2023-01-12 [1] CRAN (R 4.2.2)
   httr            1.4.4   2022-08-17 [1] CRAN (R 4.2.2)
   jsonlite        1.8.4   2022-12-06 [1] CRAN (R 4.2.2)
   KernSmooth      2.23-20 2021-05-03 [1] CRAN (R 4.2.1)
   knitr         * 1.41    2022-11-18 [1] CRAN (R 4.2.2)
   later           1.3.0   2021-08-18 [1] CRAN (R 4.2.2)
   lifecycle       1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
   lubridate     * 1.9.0   2022-11-06 [1] CRAN (R 4.2.2)
   magrittr      * 2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
   maps          * 3.4.1   2022-10-30 [1] CRAN (R 4.2.2)
   memoise         2.0.1   2021-11-26 [1] CRAN (R 4.2.2)
   mime            0.12    2021-09-28 [1] CRAN (R 4.2.0)
   miniUI          0.1.1.1 2018-05-18 [1] CRAN (R 4.2.2)
   modelr          0.1.10  2022-11-11 [1] CRAN (R 4.2.2)
   munsell         0.5.0   2018-06-12 [1] CRAN (R 4.2.2)
   pillar          1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
   pkgbuild        1.4.0   2022-11-27 [1] CRAN (R 4.2.2)
   pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
   pkgload         1.3.2   2022-11-16 [1] CRAN (R 4.2.2)
   plyr          * 1.8.8   2022-11-11 [1] CRAN (R 4.2.2)
   prettyunits     1.1.1   2020-01-24 [1] CRAN (R 4.2.2)
   processx        3.8.0   2022-10-26 [1] CRAN (R 4.2.2)
   profvis         0.3.7   2020-11-02 [1] CRAN (R 4.2.2)
   progress        1.2.2   2019-05-16 [1] CRAN (R 4.2.2)
   promises        1.2.0.1 2021-02-11 [1] CRAN (R 4.2.2)
   proxy           0.4-27  2022-06-09 [1] CRAN (R 4.2.2)
   ps              1.7.2   2022-10-26 [1] CRAN (R 4.2.2)
   purrr         * 1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
   R6              2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
   RColorBrewer  * 1.1-3   2022-04-03 [1] CRAN (R 4.2.0)
   Rcpp          * 1.0.9   2022-07-08 [1] CRAN (R 4.2.2)
   readr         * 2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
   readxl          1.4.1   2022-08-17 [1] CRAN (R 4.2.2)
   remotes       * 2.4.2   2021-11-30 [1] CRAN (R 4.2.2)
   reprex          2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
   rlang           1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
   rmarkdown     * 2.19    2022-12-15 [1] CRAN (R 4.2.2)
   rprojroot       2.0.3   2022-04-02 [1] CRAN (R 4.2.2)
   rstudioapi      0.14    2022-08-22 [1] CRAN (R 4.2.2)
   rvest           1.0.3   2022-08-19 [1] CRAN (R 4.2.2)
   scales          1.2.1   2022-08-20 [1] CRAN (R 4.2.2)
   sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
   sf              1.0-9   2022-11-08 [1] CRAN (R 4.2.2)
   shiny           1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
   stringi         1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
   stringr       * 1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
 P TADA          * 0.0.1   2023-01-17 [?] Github (USEPA/TADA@a43ebd2)
   testthat      * 3.1.6   2022-12-09 [1] CRAN (R 4.2.2)
   tibble        * 3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
   tidyr         * 1.2.1   2022-09-08 [1] CRAN (R 4.2.2)
   tidyselect      1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
   tidyverse     * 1.3.2   2022-07-18 [1] CRAN (R 4.2.2)
   timechange    * 0.2.0   2023-01-11 [1] CRAN (R 4.2.2)
   tweenr          2.0.2   2022-09-06 [1] CRAN (R 4.2.2)
   tzdb            0.3.0   2022-03-28 [1] CRAN (R 4.2.2)
   units           0.8-1   2022-12-10 [1] CRAN (R 4.2.2)
   urlchecker      1.0.1   2021-11-30 [1] CRAN (R 4.2.2)
   usethis       * 2.1.6   2022-05-25 [1] CRAN (R 4.2.2)
   utf8            1.2.2   2021-07-24 [1] CRAN (R 4.2.2)
   vctrs           0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
   vroom           1.6.0   2022-09-30 [1] CRAN (R 4.2.2)
   withr           2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
   xfun            0.36    2022-12-21 [1] CRAN (R 4.2.2)
   xml2            1.3.3   2021-11-30 [1] CRAN (R 4.2.2)
   xtable          1.8-4   2019-04-21 [1] CRAN (R 4.2.2)

 [1] C:/Program Files/R/R-4.2.1/library

 P ── Loaded and on-disk path mismatch.

Additional context Add any other context about the problem here.

ldecicco-USGS commented 1 year ago

Thanks for the report! I think in other functions I've run into problems with certain web services and inconsistent no-data returns for US territories. I'll take a look and see what we can do (sometimes they fix it on the web service end which is preferable, sometimes we fix it directly in dataRetrieval)

ldecicco-USGS commented 1 year ago

I'm trying to get confirmation, but for the NWIS services, the only allowable state/counties are: https://help.waterdata.usgs.gov/code/county_query?fmt=html If you scroll down, there isn't any for 74/UM. I'm trying to get conformation that the WQP uses the same canonical table for state/counties.

cristinamullin commented 1 year ago

At first glance it looks like NWIS's list is more exhaustive than the WQP currently: See allowable state/counties here, https://www.waterqualitydata.us/Codes/statecode?countrycode=US. But both are missing "74" or "UM" (U.S. Minor Outlying Islands). I am not sure what the process is for adding additional areas.

ldecicco-USGS commented 1 year ago

I'll add a message that lets users know if the state they requested isn't on the list.

cristinamullin commented 1 year ago

Confirmed this is the WQP/WQX state domain table: https://cdx.epa.gov/wqx/download/DomainValues/State.csv

All WQX domain tables are available here: https://www.epa.gov/waterdata/storage-and-retrieval-and-water-quality-exchange-domain-services-and-downloads

I am working with the WQX team to see if we can add "74" or "UM" (U.S. Minor Outlying Islands).

cristinamullin commented 1 year ago

FYI: This should now be resolved. We just added "74" or "UM" (U.S. Minor Outlying Islands). The domain tables and WQP UI should update within a week with the addition.

cristinamullin commented 1 year ago

Last comment for today. I wanted to provide some clarification on the domain tables.

Feel free to mark this issue as resolved.

ldecicco-USGS commented 1 year ago

To follow up, we're trying to confirm the best ways to check for state/counties in both the WQP and NWIS functions. Currently they both use the stateCd and countyCd that is provided within the dataRetrieval package itself. If NWIS and WQP are diverging however, this isn't a good solution anymore.

If I create the URL by hand, I can run it like this:

url <- "https://www.waterqualitydata.us/data/summary/monitoringLocation/search?statecode=US%3A74&siteType=Stream&zip=yes&dataProfile=periodOfRecord&mimeType=csv"
test <- importWQP(url)

test is returned as an empty data frame (so no data). Here's a clickable link if you want:

https://www.waterqualitydata.us/data/summary/monitoringLocation/search?statecode=US%3A74&siteType=Stream&zip=no&dataProfile=periodOfRecord&mimeType=csv

But just letting you know that we're still looking at the best way to do it in dataRetrieval.

ldecicco-USGS commented 9 months ago

I'm going to close this for 2 reasons... This table: https://www.waterqualitydata.us/Codes/statecode?countrycode=US still doesn't have UM

and when I try to run it by a more brute force method:

url <- "https://www.waterqualitydata.us/data/summary/monitoringLocation/search?statecode=US%3A74&siteType=Stream&zip=yes&dataProfile=periodOfRecord&mimeType=csv"
test <- importWQP(url)

there's still no data. We'll consider the multiple states, countries discussion in another issue but I don't want to add special R code in for UM/74 until I see WQP specifically (not WQX specifications)..