DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
256 stars 85 forks source link

CountyCode missing leading '0' #660

Closed cristinamullin closed 1 year ago

cristinamullin commented 1 year ago

Describe the bug

CountyCode missing leading '0' or '00'

If you are using the csv file format to bring data into R via the WQP web services, that may causing the issue.

For example:

CSV output https://www.waterqualitydata.us/data/Station/search?statecode=US%3A02&countycode=US%3A02%3A270&**mimeType=csv**&zip=no&providers=NWIS&providers=STEWARDS&providers=STORET

VS.

XLSX output https://www.waterqualitydata.us/data/Station/search?statecode=US%3A02&countycode=US%3A02%3A270&**mimeType=xlsx**&zip=no&providers=NWIS&providers=STEWARDS&providers=STORET

To Reproduce

library(dataRetrieval)

test <- dataRetrieval::whatWQPsites(project = "Anchorage Bacteria 20-21")
unique(test$CountyCode)
unique(test$StateCode)

test2 <- dataRetrieval::whatWQPsites(countycode = "US:02:020")
unique(test2$CountyCode)
unique(test2$StateCode)

test3 <- dataRetrieval::whatWQPsites(statecode = "UT", 
                                     characteristicName = c("Ammonia", "Nitrate", "Nitrogen"), 
                                     startDate = "10-01-2020")
unique(test3$CountyCode)
unique(test3$StateCode)

# Users can reference the WQX domain table to find countycode and statecode
# https://cdx.epa.gov/wqx/download/DomainValues/County_CSV.zip

test4 <- dataRetrieval::whatWQPsites(statecode = "WI", countycode  = "Dane")
unique(test4$StateCode)
unique(test4$CountyCode)

test5 <- dataRetrieval::whatWQPsites(countycode  = "US:55:025")
unique(test5$StateCode)
unique(test5$CountyCode)

# FYI below fails because in this case countycode requires statecode, but in the example above it does not
test6 <- dataRetrieval::whatWQPsites(countycode  = "Dane")

Expected behavior Expect dataRetreival to return a dataframe with the the leading '0' or '00' included in the CountyCode

Session Info Please include your session info:

> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] usmap_0.6.1          maps_3.4.1           gifski_1.6.6-1       gganimate_1.0.8      dataRetrieval_2.7.12
 [6] rlang_1.0.6          RColorBrewer_1.1-3   stringr_1.5.0        magrittr_2.0.3       ggplot2_3.4.0.9000  
[11] dplyr_1.0.10         data.table_1.14.6    plyr_1.8.8           remotes_2.4.2        TADA_0.0.1          
[16] testthat_3.1.5       devtools_2.4.5       usethis_2.1.6       

loaded via a namespace (and not attached):
 [1] fs_1.5.2           sf_1.0-8           bit64_4.0.5        lubridate_1.9.0    progress_1.2.2     httr_1.4.4        
 [7] rprojroot_2.0.3    tools_4.2.1        profvis_0.3.7      rgdal_1.6-2        utf8_1.2.2         R6_2.5.1          
[13] KernSmooth_2.23-20 DBI_1.1.3          colorspace_2.0-3   sp_1.5-1           urlchecker_1.0.1   withr_2.5.0       
[19] tidyselect_1.2.0   prettyunits_1.1.1  processx_3.8.0     bit_4.0.5          curl_4.3.3         compiler_4.2.1    
[25] cli_3.4.1          xml2_1.3.3         desc_1.4.2         labeling_0.4.2     scales_1.2.1       classInt_0.4-8    
[31] readr_2.1.3        callr_3.7.3        proxy_0.4-27       commonmark_1.8.1   digest_0.6.30      foreign_0.8-83    
[37] rmarkdown_2.20     pkgconfig_2.0.3    htmltools_0.5.4    sessioninfo_1.2.2  fastmap_1.1.0      htmlwidgets_1.6.1 
[43] rstudioapi_0.14    shiny_1.7.4        generics_0.1.3     farver_2.1.1       vroom_1.6.1        Rcpp_1.0.9        
[49] munsell_0.5.0      fansi_1.0.3        lifecycle_1.0.3    stringi_1.7.8      yaml_2.3.6         brio_1.1.3        
[55] pkgbuild_1.4.0     maptools_1.1-5     grid_4.2.1         parallel_4.2.1     promises_1.2.0.1   crayon_1.5.2      
[61] lattice_0.20-45    miniUI_0.1.1.1     hms_1.1.2          knitr_1.42         ps_1.7.2           pillar_1.8.1      
[67] pkgload_1.3.2      glue_1.6.2         evaluate_0.20      usmapdata_0.1.0    vctrs_0.5.0        tzdb_0.3.0        
[73] tweenr_2.0.2       httpuv_1.6.6       gtable_0.3.1       purrr_0.3.5        assertthat_0.2.1   cachem_1.0.6      
[79] xfun_0.34          mime_0.12          xtable_1.8-4       e1071_1.7-12       roxygen2_7.2.3     later_1.3.0       
[85] class_7.3-20       tibble_3.1.8       memoise_2.0.1      units_0.8-0        timechange_0.1.1   ellipsis_0.3.2  
ldecicco-USGS commented 1 year ago

Ha that's an interesting bug! The reason is because we're converting some "ResultCount" and other "Count" columns to numbers....but it's including the "County" too. The fix should be up shortly.

As for INPUTTING county codes, I'm still mulling the best way to do it now that we know there's a difference between NWIS and WQP.

cristinamullin commented 1 year ago

Interesting lol, that'll do it! Thank you for fixing!

Yeah, this is a little challenging since there are a few potential domain lists. My understanding is that the WQP includes the list of counties with data available in the WQP (it is a subset of the ones from WQX or NWIS with data): https://www.waterqualitydata.us/Codes/countycode. I could be wrong though. The other domain lists (WQX and NWIS specific) vary slightly in that they include some areas without data. But the extra areas are included because they are valid areas for new submissions (i.e., WQX: https://cdx.epa.gov/wqx/download/DomainValues/County_CSV.zip). Not sure where to find NWIS's most recent domain tables. Let me know what you decide. Thanks again!

ldecicco-USGS commented 1 year ago

Yeah, the NWIS list is pretty static so including the data directly in dataRetrieval is no problem. The WQP seems more dynamic, so it looks like we're going to need to rethink the entire stateCdLookup and countyCdLookup functions to dynamically call those tables. That in itself is not a huge task, but I'm looking into what are the differences between NWIS and WQP so the user doesn't actually need to care if they're calling a county code in NWIS or if they're calling a county code in WQP.