USGS-R / usgs-r.github.com

Splash page for USGS-R organization
4 stars 7 forks source link

Frequency analysis on discharge #48

Closed ldecicco-USGS closed 8 years ago

ldecicco-USGS commented 9 years ago

Are these stats already calculated somewhere:

http://groundwaterwatch.usgs.gov/SitesSP.asp?S=02236160

Frequency analysis on discharge

jordansread commented 9 years ago

aren't those on the fly from nwisweb? Any of those folks on GH?

dlorenz-usgs commented 9 years ago

I'm not sure I understand. If you retrieve daily values, there are lots of ways to summarize those data by day, month, year, water year, season, ... Pick your aggregating function; then use baseDay (smwrBase), month (lubridate), year (lubridate), waterYear (smwrBase), or seasons (smwrBase) to define the period; and what statistic you want.

On Tue, Jul 14, 2015 at 10:09 AM, Laura DeCicco notifications@github.com wrote:

Are these stats already calculated somewhere:

http://groundwaterwatch.usgs.gov/SitesSP.asp?S=02236160

Frequency analysis on discharge

— Reply to this email directly or view it on GitHub https://github.com/USGS-R/usgs-r.github.com/issues/48.

jordansread commented 9 years ago

I am not in person w/ Laura, but I was assuming the question was "are these stats available as services" Or are there public methods that would allow us to duplicate the same type of analysis that shows up on that viz (including QAQC).

dlorenz-usgs commented 9 years ago

I assumed she was asking about summary functions in smwr packages, as I assumed she was aware of the data being served. Things may be fixed now, but I abandoned these types of retrievals because I got inconsistent results back--for example calendar year stats were not computed if the water year was not complete in Oct-Dec (previous calendar year).

The QA/QC samples are not publicly available.

On Tue, Jul 14, 2015 at 10:30 AM, Jordan S Read notifications@github.com wrote:

I am not in person w/ Laura, but I was assuming the question was "are these stats available as services" Or are there public methods that would allow us to duplicate the same type of analysis that shows up on that viz (including QAQC).

— Reply to this email directly or view it on GitHub https://github.com/USGS-R/usgs-r.github.com/issues/48#issuecomment-121282573 .

jordansread commented 9 years ago

@ldecicco-USGS I think we need more context for this question.

ldecicco-USGS commented 9 years ago

Ha, sorry guys. Teaching the class...this came up...and I made the note mostly for myself. I (they) wanted to know if those stats were already directly computed in an R package (so I was going to poke around the smwr packages tonight). They are "easy" enough to just compute...but I'm trying to stress not re-inventing the wheel.

dlorenz-usgs commented 9 years ago

Sounds like that would be a great topic for a vignette in smwrStats. Questions like that often come up in the Intro classes I've taught. I always go back to the fundamental point in R: "The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software." from section 1.1 in An Introduction to R. Basically, I state what I wrote in my first response--pick your aggregation function, pick your grouping function and pick the statistics. We should really try very hard not to write programs that spit out only specific products, violating that spirit expressed in the Intro to R.

On Tue, Jul 14, 2015 at 4:10 PM, Laura DeCicco notifications@github.com wrote:

Ha, sorry guys. Teaching the class...this came up...and I made the note mostly for myself. I (they) wanted to know if those stats were already directly computed in an R package (so I was going to poke around the smwr packages tonight). They are "easy" enough to just compute...but I'm trying to stress not re-inventing the wheel.

— Reply to this email directly or view it on GitHub https://github.com/USGS-R/usgs-r.github.com/issues/48#issuecomment-121387917 .

jordansread commented 9 years ago

Yep, makes sense to me.

dlorenz-usgs commented 9 years ago

Actually, there is a vignette in smwrBase that discusses how to compute summary stats for daily values. it is called SummaryStats. I was focused on stats so thought to look in Stats rather than Base, but Base contains the grouping functions.

Man, there is just too much to track.

On Wed, Jul 15, 2015 at 8:26 AM, Jordan S Read notifications@github.com wrote:

Yep, makes sense to me.

— Reply to this email directly or view it on GitHub https://github.com/USGS-R/usgs-r.github.com/issues/48#issuecomment-121616307 .

ldecicco-USGS commented 9 years ago

To tie this up...we worked through this together in class this morning, and used a dplyr solution. We only used daily discharge data to simplify the problem (so we can't match the results exactly). But, I think from start-to-finish, with discussion throughout, this took maybe 2 hours for the class to come up with:

library(dplyr)
siteNumber <- "02236160"

pCode <- c("00060")

discharge <- readNWISdv(siteNumber = siteNumber, 
                        parameterCd = pCode)

discharge <- renameNWISColumns(discharge)
siteInfo <- attr(discharge, "siteInfo")
discharge$month <- as.POSIXlt(discharge$Date)$mon + 1

library(dplyr)

annualStats <- discharge %>%
  group_by(year,month) %>%
  summarize(median = median(Flow)) %>%
  group_by(month) %>%
  summarise(minMed = min(median, na.rm=TRUE),
            maxMed = max(median, na.rm=TRUE))

dischargeStats <- discharge %>%
  group_by(month) %>%
  summarise(p10 = quantile(Flow, probs=0.1),
            p25 = quantile(Flow, probs=0.25),
            p50 = quantile(Flow, probs=0.5),
            p75 = quantile(Flow, probs=0.75),
            p90 = quantile(Flow, probs=0.9),
            nYears = length(unique(year)))

finalStats <- merge(annualStats, dischargeStats, 
                     by="month") 

# To make the stacked bar chart:
finalStats <- finalStats[,c(1:2,4:8,3,9)]
finalStats$monthName <- month.abb[finalStats$month]
quantData <- finalStats[,c(-1,-9,-10)]
plotData <- quantData
for(i in 2:ncol(plotData)){
  plotData[,i] <- quantData[,i] - quantData[,i-1]
}

stats <- t(as.matrix(plotData))

bp <- barplot(stats, xlab="Month", 
              ylab="Discharge", 
              main=siteInfo$station_nm,
              ylim=c(50,170),xpd=FALSE,
              col=c("white","brown","orange",
                    "green","green","lightblue",
                    "darkblue"))
axis(1,labels = month.abb, at = bp)
box()
points(bp, quantData$p50, pch=17)

test

With a little futzing, we could probably match the 2 graphs pretty well. Also, I wasn't going to introduce piping....but someone brought it up and it was too tempting...so we ended up using it and loving it.

ldecicco-USGS commented 9 years ago

And with a smidgen more futzing with the plots:

library(dataRetrieval)

siteNumber <- "02236160"

pCode <- c("00060")
statCd <- c("00003")

discharge <- readNWISdv(siteNumber = siteNumber, 
                        parameterCd = pCode, statCd = statCd)

discharge <- renameNWISColumns(discharge)

summary(discharge)

names(attributes(discharge))

siteInfo <- attr(discharge, "siteInfo")
variableInfo <- attr(discharge, "variableInfo")
statisticInfo <- attr(discharge, "statisticInfo")
?as.POSIXlt
?DateTimeClasses

discharge$month <- as.POSIXlt(discharge$Date)$mon + 1
discharge$doy <- as.POSIXlt(discharge$Date)$yday + 1
discharge$year <- as.POSIXlt(discharge$Date)$year + 1900

library(dplyr)

annualStats <- discharge %>%
  group_by(year,month) %>%
  summarize(median = median(Flow)) %>%
  group_by(month) %>%
  summarise(minMed = min(median, na.rm=TRUE),
            maxMed = max(median, na.rm=TRUE))

?quantile

dischargeStats <- discharge %>%
  group_by(month) %>%
  summarise(p10 = quantile(Flow, probs=0.1),
            p25 = quantile(Flow, probs=0.25),
            p50 = quantile(Flow, probs=0.5),
            p75 = quantile(Flow, probs=0.75),
            p90 = quantile(Flow, probs=0.9),
            nYears = length(unique(year)))

finalStats <- merge(annualStats, dischargeStats, 
                     by="month") 
finalStats <- finalStats[,c(1:2,4:8,3,9)]
finalStats$monthName <- month.abb[finalStats$month]

quantData <- finalStats[,c(-1,-9,-10)]

plotData <- quantData

for(i in 2:ncol(plotData)){
  plotData[,i] <- quantData[,i] - quantData[,i-1]
}
plotData[,i+1] <- rep(10*max(discharge$Flow),nrow(plotData))

stats <- t(as.matrix(plotData))

currentMonth <- as.POSIXlt(Sys.time())$mon+1

if(currentMonth != 1){
  monthsToPlot <- c((currentMonth+1):12,1:currentMonth)
} else {
  monthsToPlot <- 1:12
}

stats <- stats[,monthsToPlot]
quantData <- quantData[monthsToPlot,]
yLimits <- range(discharge$Flow)
bp <- barplot(stats, xlab="Month", 
              ylab="Discharge", 
              main=siteInfo$station_nm,
              ylim=yLimits,xpd=FALSE,
              col=c("white","brown","orange",
                    "green","green","lightblue",
                    "darkblue","white"),
              space=0)
axis(1,labels = month.abb[monthsToPlot], at = bp)
box()
points(bp, quantData$p50, pch=17)
par(new=TRUE)
plot(discharge$Flow[(nrow(discharge)-365):nrow(discharge)],
     ylim=yLimits, axes=FALSE,xlab="",ylab="", pch=18, col="red")

test

ldecicco-USGS commented 9 years ago

And just to acknowledge...the script above is in no way fully vetted nor necessarily the right way to do things...just what we came up with in our class as a group. (I know the red dots are off for instance...we're plotting the last 365 points...it should really match dates better).