DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
259 stars 84 forks source link

How to get metadata #602

Closed dyun33 closed 2 years ago

dyun33 commented 2 years ago

What is your question? Thanks a lot for creating the package. I am working with gage height data at multiple, but nearby, sites in the Everglades, Florida. So far, I have downloaded data from three sites using readNWISuv. At two of the three sites, pulled data do not provide any metadata (except timezone [tz column]). I have to figure out its unit by downloading data manually from the USGS website (https://sofia.usgs.gov/eden/stationlist.php). More problematic is one site (siteNo <- "262240080258001") has gage height in "ft NGVD29", while the other (siteNo <- "262258080273501") has it in "ft NAVD88": using different vertical references. Another nearby site (siteNo <- "262300080220001") do provide its vertical reference on its column name (e.g., X_UPSTREAM..NGVD29_00065_00000), but no unit.

To Reproduce

library(dataRetrieval)
siteNo <- "262240080258001" # SITE_17 (FT NGVD29)
siteNo <- "262258080273501" # EDEN11 (FT NAVD88)
siteNo <- "262300080220001" # S10D_H and _T (FT NGVD29)
pCode <- "00065"
start.date <- "2022-01-20"
end.date <- "2022-01-27"

gageheight <- readNWISuv(siteNumbers = siteNo,
                      parameterCd = pCode,
                      startDate = start.date,
                      endDate = end.date)

Session Info


R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dataRetrieval_2.7.10
ldecicco-USGS commented 2 years ago

Great question. There is some metadata attached to the returned data frame. Try this for example:

library(dataRetrieval)
siteNo1 <- "262240080258001" # SITE_17 (FT NGVD29)
siteNo2 <- "262258080273501" # EDEN11 (FT NAVD88)
siteNo3 <- "262300080220001" # S10D_H and _T (FT NGVD29)
pCode <- "00065"
start.date <- "2021-01-20"
end.date <- "2022-01-27"

gageheight <- readNWISuv(siteNumbers = c(siteNo1, siteNo2, siteNo3),
                         parameterCd = pCode,
                         startDate = start.date,
                         endDate = end.date)

station_info <- attr(gageheight, "siteInfo")
param_info <- attr(gageheight, "variableInfo")
attr(gageheight, "url")

The station_info and param_info data frames have some metadata information....

Looking at those sites more closely, I think you might want to add pcode "63160", (Stream level, NAVD88) to site 262300080220001:

gageheight2 <- readNWISuv(siteNumbers = c(siteNo1, siteNo2, siteNo3),
                         parameterCd = c("00065", "63160"),
                         startDate = start.date,
                         endDate = end.date)

station_info2 <- attr(gageheight2, "siteInfo")
param_info2 <- attr(gageheight2, "variableInfo")

param_info2 now includes 2 rows about the different kinds of stream levels.

Now.....how could you have known to find both pcodes? You can use the whatNWISdata function:

what_data <- whatNWISdata(siteNumber = c(siteNo1, siteNo2, siteNo3))

unique(what_data$parm_cd[what_data$data_type_cd == "uv"])
[1] "00065" "63160"