DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
260 stars 84 forks source link

retrieve water year summary statistics? #706

Closed mkoohafkan closed 2 months ago

mkoohafkan commented 4 months ago

Is it possible to use dataRetrieval to get the Water Year Summary statistics? For example, see this "Summary Statistics" table for Site 01010070 retrived via

https://waterdata.usgs.gov/nwis/wys_rpt?dv_ts_ids=63582&wys_water_yr=2023&site_no=01010070&agency_cd=USGS&adr_water_years=2006%2C2007%2C2008%2C2009%2C2010%2C2011%2C2012%2C2013%2C2014%2C2015%2C2016%2C2017%2C2018%2C2019%2C2020%2C2021%2C2022%2C2023&referred_module=.

image

I am hoping to get these stats for a large number of sites and hope I won't have to calculate these myself from daily timeseries, but I can't figure out how to use readNWISstat() or readNWISdata() to get this table.

ldecicco-USGS commented 4 months ago

I am not aware of a way to get that information directly. Messing around with the dv, stat, and peak services, I can get a fair amount using some dplyr:

library(dataRetrieval)
library(dplyr)

site <- "01010070"
daily <- readNWISdv(site, "00060",startDate = "1983-10-01")

daily_stats <- daily |>
  addWaterYear() |> 
  group_by(waterYear) |> 
  summarise(high = max(X_00060_00003),
            low = min(X_00060_00003),
            high_day = Date[which.max(X_00060_00003)],
            low_day = Date[which.min(X_00060_00003)]) |> 
  ungroup()

highest_daily_mean <- max(daily_stats$high, na.rm = TRUE)
highest_daily_mean
[1] 7000
highest_daily_mean_when <- daily_stats$high_day[which.max(daily_stats$high)]
highest_daily_mean_when
[1] "2008-04-30"
lowest_daily_mean <- min(daily_stats$low, na.rm = TRUE)
lowest_daily_mean
[1] 7.4
lowest_daily_mean_when <- daily_stats$low_day[which.min(daily_stats$low)]
lowest_daily_mean_when
[1] "1985-09-24"

highest_annual <- max(annual_stats$mean,na.rm = TRUE)
highest_annual
[1] 487.6066
highest_annual_when <- annual_stats$waterYear[which.max(annual_stats$mean)]
highest_annual_when
[1] 2006
lowest_annual <- min(annual_stats$mean,na.rm = TRUE)
lowest_annual
[1] 157.4496
lowest_annual_when <- annual_stats$waterYear[which.min(annual_stats$mean)]
lowest_annual_when
[1] 1985
annual_mean <- mean(annual_stats$mean, na.rm = TRUE)
annual_mean
[1] 341.6305

peak_flow <- readNWISpeak(site)
highest_peak <- max(peak_flow$peak_va)
highest_peak
[1] 8680
highest_peak_when <- peak_flow$peak_dt[which.max(peak_flow$peak_va)]
highest_peak_when
[1] "1987-04-01"
highest_peak_stage <- max(peak_flow$gage_ht, na.rm = TRUE)
highest_peak_stage
[1] 15.62
highest_peak_stage_when <- peak_flow$peak_dt[which.max(peak_flow$gage_ht)]
highest_peak_stage_when
[1] "1987-04-01"

site_info <- readNWISsite(site)
site_info$drain_area_va
[1] 171
# Runoff?
annual_mean/site_info$drain_area_va
[1] 1.997839

I don't know what the 10, 50, 90 exceed values mean.

But you could bundle up that code into a function to get the stats for you site. I'll ask around to see if there's a more official way to get that (and get those exceedance values).