DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
256 stars 85 forks source link

renameNWISColumns does not rename parameter code 00480 #615

Closed bwmatherne closed 2 years ago

bwmatherne commented 2 years ago

Describe the bug When downloading data, the salinity code 00480 does not work with renameNWISColumns. This creates issues with other downstream steps such as attr() that can't pull the variable details to easily label figures.

To Reproduce pCode <- c("00010","00060","00480") siteNo <- c("295124089542100","294925089532101","073745257","07374526") start.date <- "2007-10-01" end.date <- "2012-05-31" sitesLA <- readNWISuv(siteNo, pCode,start.date, end.date) sitesLA <- renameNWISColumns(sitesLA) names(sitesLA)

> problem_query <- readNWISdv("a","b","c","d")
Error in constructNWISURL(siteNumbers, parameterCd, startDate, endDate,  : 
  The following pCodes appear mistyped:b

Expected behavior All parameters should be properly renamed and work with other R code to make human readable labels.

Screenshots

Screen Shot 2022-05-16 at 12 32 15 PM Screen Shot 2022-05-16 at 12 33 45 PM

Session Info Please include your session info:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.5        dataRetrieval_2.7.11

loaded via a namespace (and not attached):
 [1] xml2_1.3.3       magrittr_2.0.3   tidyselect_1.1.2 munsell_0.5.0   
 [5] colorspace_2.0-2 R6_2.5.1         rlang_1.0.2      fansi_1.0.3     
 [9] httr_1.4.3       dplyr_1.0.8      tools_4.1.2      grid_4.1.2      
[13] gtable_0.3.0     utf8_1.2.2       cli_3.3.0        DBI_1.1.2       
[17] withr_2.5.0      ellipsis_0.3.2   assertthat_0.2.1 tibble_3.1.7    
[21] lifecycle_1.0.1  crayon_1.5.1     purrr_0.3.4      vctrs_0.4.1     
[25] curl_4.3.2       glue_1.6.2       compiler_4.1.2   pillar_1.7.0    
[29] generics_0.1.2   scales_1.1.1     lubridate_1.8.0  jsonlite_1.8.0  
[33] pkgconfig_2.0.3 

Additional context Add any other context about the problem here.

ldecicco-USGS commented 2 years ago

There are only a handful of pcodes that will have a default set for the renameNWISColumns function. You check out the help page here: http://usgs-r.github.io/dataRetrieval/reference/renameNWISColumns.html

and see the preset names are: image

If you want to define 00480 as "Salinity", you can do this (the other defaults will remain):

sitesLA <- renameNWISColumns(sitesLA, p00480="Salinity")
names(sitesLA)
 [1] "agency_cd"        "site_no"          "dateTime"        
 [4] "Wtemp_Inst"       "Wtemp_Inst_cd"    "Flow_Inst"       
 [7] "Flow_Inst_cd"     "Salinity_Inst"    "Salinity_Inst_cd"
[10] "tz_cd"

I think you are then requesting that the attribute variableInfo gets updated when this function is run. My instinct is to push back on that request because the variableInfo information is direct metadata from NWIS, and the renameNWISColumns is a simple convenience function to change some column headers. There are more complicated data sets that come back with multiple columns of the same parameter (for a variety of reasons, maybe the samples are at different depths, maybe facing upstream vs downstream, maybe different statistic codes), which would make this request more complicated.

If you wanted to join those short column names with the variableInfo attribute, you could do something like this:

variableInfo <- attr(sitesLA, "variableInfo")

variableInfo <- variableInfo |> 
  dplyr::left_join(data.frame(variableCode = c("00010","00060","00480"),
                              shortName = c("Wtemp", "Flow", "Salinity"),
                              columnNames = c("Wtemp_Inst", "Flow_Inst", "Salinity_Inst")),
                   by = "variableCode")

I'm not sure what the problem_query was trying to show:

problem_query <- readNWISdv("a","b","c","d")

If you call readNWISdv with those inputs. What you are pasting in there is siteNumbers is "a", parameterCd is "b", startDate is "c", and endDate is "d". The first error message that stops the function is that parameter code is not right (since it needs to be a 5 digit character).

Let me know if I mis-understood. Thanks!

bwmatherne commented 2 years ago

Laura,

Thank you for such a quick reply. I had been manually changing the labels, but I was worried if this would create other problems as I progress with the datasets that I’m working on. Everything you stated is quite clear and helpful.

Take care,

Brian

Brian W. Matherne Doctoral Candidate Department of Environmental Sciences Louisiana State University 3259 Energy, Coast, & Environment Bldg., Baton Rouge, LA 70803 mobile 225-772-6126 @. @.> | lsu.edu http://www.lsu.edu/ | www.environmental.lsu.edu/ http://www.environmental.lsu.edu/

On May 16, 2022, at 1:11 PM, Laura DeCicco @.***> wrote:

There are only a handful of pcodes that will have a default set for the renameNWISColumns function. You check out the help page here: http://usgs-r.github.io/dataRetrieval/reference/renameNWISColumns.html http://usgs-r.github.io/dataRetrieval/reference/renameNWISColumns.html and see the preset names are: https://user-images.githubusercontent.com/1105215/168653932-61f8c12a-8b66-48dc-a904-80bdac75f129.png If you want to define 00480 as "Salinity", you can do this (the other defaults will remain):

sitesLA <- renameNWISColumns(sitesLA, p00480="Salinity") names(sitesLA) [1] "agency_cd" "site_no" "dateTime"
[4] "Wtemp_Inst" "Wtemp_Inst_cd" "Flow_Inst"
[7] "Flow_Inst_cd" "Salinity_Inst" "Salinity_Inst_cd" [10] "tz_cd" I think you are then requesting that the attribute variableInfo gets updated when this function is run. My instinct is to push back on that request because the variableInfo information is direct metadata from NWIS, and the renameNWISColumns is a simple convenience function to change some column headers. There are more complicated data sets that come back with multiple columns of the same parameter (for a variety of reasons, maybe the samples are at different depths, maybe facing upstream vs downstream, maybe different statistic codes), which would make this request more complicated.

If you wanted to join those short column names with the variableInfo attribute, you could do something like this:

variableInfo <- attr(sitesLA, "variableInfo")

variableInfo <- variableInfo |> dplyr::left_join(data.frame(variableCode = c("00010","00060","00480"), shortName = c("Wtemp", "Flow", "Salinity"), columnNames = c("Wtemp_Inst", "Flow_Inst", "Salinity_Inst")), by = "variableCode") I'm not sure what the problem_query was trying to show:

problem_query <- readNWISdv("a","b","c","d") If you call readNWISdv with those inputs. What you are pasting in there is siteNumbers is "a", parameterCd is "b", startDate is "c", and endDate is "d". The first error message that stops the function is that parameter code is not right (since it needs to be a 5 digit character).

Let me know if I mis-understood. Thanks!

— Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/615#issuecomment-1127981601, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWMH3SRL4K4VOVKQZ6Z2A3VKKFVHANCNFSM5WCHFEFA. You are receiving this because you authored the thread.