DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
256 stars 85 forks source link

readWQPsummary YearSummarized column format not consistently YYYY #659

Closed ehinman closed 9 months ago

ehinman commented 1 year ago

Describe the bug readWQPsummary() contains a YearSummarized column, which should contain (according to the help page) "the year of the summary". However, I tested a large query for DO data where the YearSummarized column contained non-real years (e.g. "5", "2206")

To Reproduce Steps to reproduce the behavior:

test = dataRetrieval::readWQPsummary(characteristicName = "Dissolved oxygen (DO)",siteType = "Stream")
minyr = min(test$YearSummarized)
maxyr = max(test$YearSummarized)

Expected behavior I am using YearSummarized in readWQPsummary() to define a date range to query using readWQPdata, and I expect a min and max year summarized in a normal YYYY format that can be used to define a YYYY-mm-dd to be placed into startDate and endDate in readWQPdata. With single digits and years that do not exist, readWQPdata will throw an error because the date is not in the correct format or does not exist--the max date in year 2206 (YYYY) shouldn't matter but a min year of 5 is a problem.

Screenshots If applicable, add screenshots to help explain your problem.

Session Info Please include your session info:

sessionInfo()
#OR preferred:
devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 22000)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2023-01-30
 rstudio  2022.07.1+554 Spotted Wakerobin (desktop)
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ────────────────────────────────────────────────────────────────────────────────────────
 ! package       * version date (UTC) lib source
   assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.2.2)
   backports       1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
   bit             4.0.5   2022-11-15 [1] CRAN (R 4.2.2)
   bit64           4.0.5   2020-08-30 [1] CRAN (R 4.2.2)
   brio            1.1.3   2021-11-30 [1] CRAN (R 4.2.2)
   broom           1.0.2   2022-12-15 [1] CRAN (R 4.2.2)
   cachem          1.0.6   2021-08-19 [1] CRAN (R 4.2.2)
   callr           3.7.3   2022-11-02 [1] CRAN (R 4.2.2)
   cellranger      1.1.0   2016-07-27 [1] CRAN (R 4.2.2)
   class           7.3-20  2022-01-16 [1] CRAN (R 4.2.1)
   classInt        0.4-8   2022-09-29 [1] CRAN (R 4.2.2)
   cli             3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
   colorspace      2.0-3   2022-02-21 [1] CRAN (R 4.2.2)
   crayon          1.5.2   2022-09-29 [1] CRAN (R 4.2.2)
   curl            5.0.0   2023-01-12 [1] CRAN (R 4.2.2)
   data.table    * 1.14.6  2022-11-16 [1] CRAN (R 4.2.2)
   dataRetrieval   2.7.12  2023-01-17 [1] Github (USGS-R/dataRetrieval@e90c02c)
   DBI             1.1.3   2022-06-18 [1] CRAN (R 4.2.2)
   dbplyr          2.3.0   2023-01-16 [1] CRAN (R 4.2.2)
   desc            1.4.2   2022-09-08 [1] CRAN (R 4.2.2)
   devtools      * 2.4.5   2022-10-11 [1] CRAN (R 4.2.2)
   digest          0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
   dplyr         * 1.0.10  2022-09-01 [1] CRAN (R 4.2.2)
   e1071           1.7-12  2022-10-24 [1] CRAN (R 4.2.2)
   ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.2.2)
   evaluate        0.19    2022-12-13 [1] CRAN (R 4.2.2)
   fansi           1.0.3   2022-03-24 [1] CRAN (R 4.2.2)
   farver          2.1.1   2022-07-06 [1] CRAN (R 4.2.2)
   fastmap         1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
   forcats       * 0.5.2   2022-08-19 [1] CRAN (R 4.2.2)
   fs              1.5.2   2021-12-08 [1] CRAN (R 4.2.2)
   gargle          1.2.1   2022-09-08 [1] CRAN (R 4.2.2)
   generics        0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
   gganimate       1.0.8   2022-09-08 [1] CRAN (R 4.2.2)
   ggplot2       * 3.4.0   2022-11-04 [1] CRAN (R 4.2.2)
   gifski          1.6.6-1 2022-04-05 [1] CRAN (R 4.2.2)
   glue            1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
   googledrive     2.0.0   2021-07-08 [1] CRAN (R 4.2.2)
   googlesheets4   1.0.1   2022-08-13 [1] CRAN (R 4.2.2)
   gtable          0.3.1   2022-09-01 [1] CRAN (R 4.2.2)
   haven           2.5.1   2022-08-22 [1] CRAN (R 4.2.2)
   hms             1.1.2   2022-08-19 [1] CRAN (R 4.2.2)
   htmltools       0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
   htmlwidgets     1.6.1   2023-01-07 [1] CRAN (R 4.2.2)
   httpuv          1.6.8   2023-01-12 [1] CRAN (R 4.2.2)
   httr            1.4.4   2022-08-17 [1] CRAN (R 4.2.2)
   jsonlite        1.8.4   2022-12-06 [1] CRAN (R 4.2.2)
   KernSmooth      2.23-20 2021-05-03 [1] CRAN (R 4.2.1)
   knitr         * 1.41    2022-11-18 [1] CRAN (R 4.2.2)
   later           1.3.0   2021-08-18 [1] CRAN (R 4.2.2)
   lifecycle       1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
   lubridate     * 1.9.0   2022-11-06 [1] CRAN (R 4.2.2)
   magrittr      * 2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
   maps            3.4.1   2022-10-30 [1] CRAN (R 4.2.2)
   memoise         2.0.1   2021-11-26 [1] CRAN (R 4.2.2)
   mime            0.12    2021-09-28 [1] CRAN (R 4.2.0)
   miniUI          0.1.1.1 2018-05-18 [1] CRAN (R 4.2.2)
   modelr          0.1.10  2022-11-11 [1] CRAN (R 4.2.2)
   munsell         0.5.0   2018-06-12 [1] CRAN (R 4.2.2)
   pillar          1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
   pkgbuild        1.4.0   2022-11-27 [1] CRAN (R 4.2.2)
   pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
   pkgload         1.3.2   2022-11-16 [1] CRAN (R 4.2.2)
   plyr          * 1.8.8   2022-11-11 [1] CRAN (R 4.2.2)
   prettyunits     1.1.1   2020-01-24 [1] CRAN (R 4.2.2)
   processx        3.8.0   2022-10-26 [1] CRAN (R 4.2.2)
   profvis         0.3.7   2020-11-02 [1] CRAN (R 4.2.2)
   progress        1.2.2   2019-05-16 [1] CRAN (R 4.2.2)
   promises        1.2.0.1 2021-02-11 [1] CRAN (R 4.2.2)
   proxy           0.4-27  2022-06-09 [1] CRAN (R 4.2.2)
   ps              1.7.2   2022-10-26 [1] CRAN (R 4.2.2)
   purrr         * 1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
   R6              2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
   RColorBrewer  * 1.1-3   2022-04-03 [1] CRAN (R 4.2.0)
   Rcpp          * 1.0.9   2022-07-08 [1] CRAN (R 4.2.2)
   readr         * 2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
   readxl          1.4.1   2022-08-17 [1] CRAN (R 4.2.2)
   remotes       * 2.4.2   2021-11-30 [1] CRAN (R 4.2.2)
   reprex          2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
   rlang           1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
   rmarkdown     * 2.19    2022-12-15 [1] CRAN (R 4.2.2)
   rprojroot       2.0.3   2022-04-02 [1] CRAN (R 4.2.2)
   rstudioapi      0.14    2022-08-22 [1] CRAN (R 4.2.2)
   rvest           1.0.3   2022-08-19 [1] CRAN (R 4.2.2)
   scales          1.2.1   2022-08-20 [1] CRAN (R 4.2.2)
   sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
   sf              1.0-9   2022-11-08 [1] CRAN (R 4.2.2)
   shiny           1.7.4   2022-12-15 [1] CRAN (R 4.2.2)
   stringi         1.7.12  2023-01-11 [1] CRAN (R 4.2.2)
   stringr       * 1.5.0   2022-12-02 [1] CRAN (R 4.2.2)
 P TADA          * 0.0.1   2023-01-30 [?] Github (USEPA/TADA@5d66a09)
   testthat      * 3.1.6   2022-12-09 [1] CRAN (R 4.2.2)
   tibble        * 3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
   tidyr         * 1.2.1   2022-09-08 [1] CRAN (R 4.2.2)
   tidyselect      1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
   tidyverse     * 1.3.2   2022-07-18 [1] CRAN (R 4.2.2)
   timechange    * 0.2.0   2023-01-11 [1] CRAN (R 4.2.2)
   tweenr          2.0.2   2022-09-06 [1] CRAN (R 4.2.2)
   tzdb            0.3.0   2022-03-28 [1] CRAN (R 4.2.2)
   units           0.8-1   2022-12-10 [1] CRAN (R 4.2.2)
   urlchecker      1.0.1   2021-11-30 [1] CRAN (R 4.2.2)
   usethis       * 2.1.6   2022-05-25 [1] CRAN (R 4.2.2)
   utf8            1.2.2   2021-07-24 [1] CRAN (R 4.2.2)
   vctrs           0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
   vroom           1.6.0   2022-09-30 [1] CRAN (R 4.2.2)
   withr           2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
   xfun            0.36    2022-12-21 [1] CRAN (R 4.2.2)
   xml2            1.3.3   2021-11-30 [1] CRAN (R 4.2.2)
   xtable          1.8-4   2019-04-21 [1] CRAN (R 4.2.2)

 [1] C:/Program Files/R/R-4.2.1/library

 P ── Loaded and on-disk path mismatch.

Additional context Add any other context about the problem here.

ldecicco-USGS commented 1 year ago

Thanks for the report! I had reported this issue to the WQP team when the services first came out and (at the time) said there's not much they can do since the column is created from improperly entered raw data. For example, pulling a couple of the sites that cause the funky years:

sites <- c("CHEROKEE-ILL1", "21NMEX-32RGRAND464.2C")
data <- readWQPqw(sites, "Dissolved oxygen (DO)")

range(data$ActivityStartDate)
"0005-05-19" "2206-08-24"

Should the first date be 2005-05-19? Probably, but the WQP folks don't want to be in the business of cleaning groups' data. Should that second date be 2006? maybe?

That being said, I'll check with them again to see if anything's changed (like maybe they'd consider not summarizing years that don't make sense ie before 1800 or after the current year).

ehinman commented 1 year ago

Thanks Laura! I need to get a better sense for whether these issues lay with dataRetrieval or some other part of the system--don't want to bog you down with issues that need to be solved elsewhere. For now, I'll accommodate in my code.

ldecicco-USGS commented 1 year ago

It's probably best to continue to report issues here (on GitHub). We can coordinate who to contact easier than making the user try to figure it out.

lstanish-usgs commented 1 year ago

I agree that we should re-visit the handling of improperly formatted data by the service. @ldecicco-USGS

ldecicco-USGS commented 9 months ago

Closing this issue because I don't want dataRetrieval to make the decisions on how to deal with improper dates, I want WQP to do that.