DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
259 stars 84 forks source link

Adding bibliographic entry constructor for NWIS access date recording #604

Closed ghost closed 6 months ago

ghost commented 2 years ago

It would be useful to have a constructor function for creating NWIS bibliographic entries as per USGS publication series requirements. Potentially, expanding the functionality to other general bibliographic entry styles for other publishers and considering manual LaTeX style bibliographic entries as well as BibTeX entries. For this posting, only a USGS style is implemented as a basis for further consideration by the dataRetrieval developers. At least the support for USGS style, re-enforces the "policy" of reporting access dates and mandatory use of the DOI url to NWIS itself.

PS: A manual LaTeX entry in USGS style might look like this with the citation year being changed on the fly.

\bibitem[U.S. Geological Survey(2022)]{NWISdoi}
U.S. Geological Survey, 2022, USGS water data for the Nation: U.S. Geological Survey National Water Information System database,accessed February 2, 2022, at \url{https://doi.org/10.5066/F7P55KJN}.

Code and documentation follow:

"constructNWISbiblio" <-
function(style=c("USGS"), make_attr=TRUE, show_access=TRUE, end_period=TRUE,
         mid_newline=TRUE, end_newline=FALSE, hangpara=2) {
  style <- match.arg(style)

  doi   <- "https://doi.org/10.5066/F7P55KJN"
  if(end_period) doi <- paste0(doi, ".")

  date <- unlist(strsplit(as.character(Sys.Date()), "-"))
  yr   <- date[1]
  mn   <- month.name[as.numeric(date[2])] # or like month.name[month.abb == format(Sys.time(), "%b")]
  dd   <- gsub("^0", "", date[3]) # day and strip leading zeros

  if(mid_newline) {
    mid_newline <- ifelse(mid_newline, "\n", "")
  } else {
    mid_newline <- ""
    hangpara <- 0
  }
  hangpara <- paste(rep(" ", hangpara), collapse="")

  if(style == "USGS") {
    my_access <- ""
    if(show_access) my_access <- paste0("accessed ", mn, " ", dd, ", ", yr, ", at")

    txt <- paste0("U.S. Geological Survey, ", yr, ", USGS water data for the Nation: ", mid_newline, hangpara,
                  "U.S. Geological Survey National Water Information System database,", mid_newline, hangpara,
                  my_access, " ", doi)
  }

  if(end_newline) txt <- paste0(txt, "\n")

  if(make_attr) {
    nwis_access <- paste0("NWIS_ACCESS_", paste(date, collapse=""), '.txt')
    attr(txt, "nwis_access") <- nwis_access
  }

  return(txt)
}

Here is proposed Rd content (sorry, I write long-hand Rds) and please note the use of some type of optional attributes to the string is very deliberate for also writing log files with at much low-level string work for a file name being done within the function. Suggest that some type of output file naming argument could be added to replace my hard wired NWISACCESS*.txt structure.

\encoding{utf8}
\name{constructNWISbiblio}
\alias{constructNWISbiblio}
\title{Construct Bibliographic Entry for NWIS Access Citing}
\description{
Construct a bibliographic entry string for citing and logging USGS National Water Information System (NWIS) access dates and optional attribution to the access. Minor features for formatting are provided to tune the string in various ways. The general USGS publication series style for NWIS citation follows:
\preformatted{
  U.S. Geological Survey, 2022, USGS water data for the Nation: U.S. Geological Survey
  National Water Information System database, accessed February 2, 2022,
  at https://doi.org/10.5066/F7P55KJN."
}
}
\usage{
constructNWISbiblio(style=c("USGS"), makeattr=TRUE, end_period=TRUE,
                    mid_newline=TRUE, end_newline=FALSE, hangpara=2)
}
\arguments{
  \item{style}{The style of the bibliographic entry to so product;}
  \item{make_attr}{A logical switch on attributing (\code{"nwis_access"}) the returned object with a file name that might be useful for logging access;}
  \item{show_access}{A logical switch to insert the access date to NWIS. There could be some rare circumstances in which the user does not want to mislead a reader of a publication when citing NWIS in a generic sense;}
  \item{end_period}{A logical switch to insert a period at the end of the citation after the url;}
  \item{mid_newline}{A logical switch to insert newline characters at strategic locations within the bibliographic entry that mimic line wrapping for horizontal compression of the bibliographic entry, which might make it more \dQuote{readable} for some practical circumstances. If \code{mid_newline} is not set, then internally \code{hang_para} set to zero;}
  \item{end_newline}{A logical switch to insert a newline character at the end of the string;}
  \item{hangpara}{A logical to mimic hanging-paragraph style to subordinate lines by inserting leading spaces; and}
  \item{...}{Additional arguments to pass if ever implemented.}
}
\value{
A character string of a bibliographic entry.
}
}
\author{someone}
\references{
U.S. Geological Survey, 2021, USGS water data for the Nation: U.S. Geological Survey National Water Information System database, accessed March 21, 2021, at https://doi.org/10.5066/F7P55KJN.
}
\examples{

constructNWISbiblio(make_attr=FALSE, mid_newline=FALSE)

\dontrun{
  # This example is disabled to avoid dependency on crayon package. This example shows
  # one use of writing a text file dated to the access with the bibliographic entry
  # within the file and messaging to the console in blue that such a file is made.
  DAILY_SWFLOW_CORE <- "DVFLOWS"
  bib <- constructNWISbiblio()
  nwis_access <- paste0(DAILY_SWFLOW_CORE, "_", attr(bib, "nwis_access"))
  txt  <- "MANIFEST: logging USGS NWIS access date into '", nwis_access, "'"
  message(crayon::blue(txt))
  cat(cite, file=nwis_access)
}
}
ldecicco-USGS commented 2 years ago

Thanks for the submission! We're consider adding this or something like it eventually. I want to get a bug fix out to CRAN in the near term, but I'll have a look at this in the next couple of weeks.

My initial thought is that we can currently use the "queryTime" attribute that is attached to each dataset to produce the bibliography at some later point in the script:

library(dataRetrieval)
df <- readNWISdv("05212700", "00060", "2021-01-01","2021-01-07")
attr(df, "queryTime")
[1] "2022-02-10 14:39:15 CST"
# or
format(attr(df, "queryTime"), format = "%b %d, %Y")
[1] "Feb 10, 2022"

Another thought I'm having reading this suggestion, maybe we should create output similar to the citation function. This would then allow R users to create Bibtex outputs or use other tools that are built around that output. So instead of:

cite_dataRetrieval <- citation("dataRetrieval")
toBibtex(cite_dataRetrieval )
@Manual{,
  author = {Laura A. {De Cicco} and David Lorenz and Robert M. Hirsch and William Watkins and Mike Johnson},
  title = {dataRetrieval: R packages for discovering and retrieving water data available from U.S. federal hydrologic web services},
  publisher = {U.S. Geological Survey},
  address = {Reston, VA},
  version = {2.7.9},
  institution = {U.S. Geological Survey},
  year = {2021},
  doi = {10.5066/P9X4L3GE},
  url = {https://code.usgs.gov/water/dataRetrieval},
}

We could do something like:

df <- readNWISdv("05212700", "00060", "2021-01-01","2021-01-07")
cite_NWIS <- citation(df)
toBibtex(cite_NWIS)
@Manual{,
get the right NWIS citation worked out here...
}

Anyway, great idea, thanks again.

ldecicco-USGS commented 7 months ago

I didn't mean to close this...in fact I have a function coming out soon

ldecicco-USGS commented 6 months ago

So with the most recent updates you can now do this:

library(dataRetrieval)
WQPData <- readWQPqw("USGS-05288705",
                     parameterCd = "00300")
wqp_citation <- create_WQP_bib(WQPData)
wqp_citation
National Water Quality Monitoring Council (2024). _ Water
Quality Portal_. doi:10.5066/P9QRKUVJ
<https://doi.org/10.5066/P9QRKUVJ>, Accessed Feb 09, 2024,
<https://www.waterqualitydata.us/data/Result/search?siteid=USGS-05288705&pCode=00300&mimeType=tsv&zip=yes>.

print(wqp_citation, style = "Bibtex")
@Manual{,
  title = { Water Quality Portal},
  author = {{National Water Quality Monitoring Council}},
  doi = {10.5066/P9QRKUVJ},
  note = {Accessed Feb 09, 2024},
  year = {2024},
  url = {https://www.waterqualitydata.us/data/Result/search?siteid=USGS-05288705&pCode=00300&mimeType=tsv&zip=yes},
}
print(wqp_citation, style = "citation")
National Water Quality Monitoring Council, 2024, Water Quality
Portal, accessed 02, 09, 2024,
https://www.waterqualitydata.us/data/Result/search?siteid=USGS-05288705&pCode=00300&mimeType=tsv&zip=yes,
https://doi.org/10.5066/P9QRKUVJ.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = { Water Quality Portal},
    author = {{National Water Quality Monitoring Council}},
    doi = {10.5066/P9QRKUVJ},
    note = {Accessed Feb 09, 2024},
    year = {2024},
    url = {https://www.waterqualitydata.us/data/Result/search?siteid=USGS-05288705&pCode=00300&mimeType=tsv&zip=yes},
  }

nwisData <- readNWISdv("04085427", "00060", "2012-01-01", "2012-06-30")
nwis_citation <- create_NWIS_bib(nwisData)
nwis_citation
U.S. Geological Survey (2024). _National Water Information
System data available on the World Wide Web (USGS Water Data
for the Nation)_. doi:10.5066/F7P55KJN
<https://doi.org/10.5066/F7P55KJN>, Accessed Feb 09, 2024,
<https://waterservices.usgs.gov/nwis/dv/?site=04085427&format=waterml,1.1&ParameterCd=00060&StatCd=00003&startDT=2012-01-01&endDT=2012-06-30>.

print(nwis_citation, style = "Bibtex")
@Manual{,
  title = {National Water Information System data available on the World Wide Web (USGS Water Data for the Nation)},
  author = {{U.S. Geological Survey}},
  doi = {10.5066/F7P55KJN},
  note = {Accessed Feb 09, 2024},
  year = {2024},
  url = {https://waterservices.usgs.gov/nwis/dv/?site=04085427&format=waterml,1.1&ParameterCd=00060&StatCd=00003&startDT=2012-01-01&endDT=2012-06-30},
}
print(nwis_citation, style = "citation")
U.S. Geological Survey, 2024, National Water Information System
data available on the World Wide Web (USGS Water Data for the
Nation), accessed Feb 09, 2024, at
https://waterservices.usgs.gov/nwis/dv/?site=04085427&format=waterml,1.1&ParameterCd=00060&StatCd=00003&startDT=2012-01-01&endDT=2012-06-30,
http://dx.doi.org/10.5066/F7P55KJN

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {National Water Information System data available on the World Wide Web (USGS Water Data for the Nation)},
    author = {{U.S. Geological Survey}},
    doi = {10.5066/F7P55KJN},
    note = {Accessed Feb 09, 2024},
    year = {2024},
    url = {https://waterservices.usgs.gov/nwis/dv/?site=04085427&format=waterml,1.1&ParameterCd=00060&StatCd=00003&startDT=2012-01-01&endDT=2012-06-30},
  }