USGS-R / smwrGraphs

Moved to: https://code.usgs.gov/water/analysis-tools/smwrGraphs
https://code.usgs.gov/water/analysis-tools/smwrGraphs
Other
18 stars 16 forks source link

overriding end date in time series plot #31

Open cpenn-usgs opened 8 years ago

cpenn-usgs commented 8 years ago

I plotted a time series plot with chemistry data spanning WY1979-2015, the xaxis.range hard coded as 1978-10-01 to 2015-10-01, major label was coded as "water year". The default tick span was plotting ever year and the tick labels were overlapping. I manually set tick span to 2, which caused the date axis (x-axis) to span from the start of WY1979 to the beginning of WY2015 (2014-10-10), which cut off all 2015 data points. I hardcoded the xaxis.range to then end at 2016-10-01. This caused the date axis to span from the start of WY1979 to the beginning of WY2017 (2016-10-01), which includes the 2015 data but extends the graphic longer then needed.

ldecicco-USGS commented 8 years ago

Any chance you could paste that code you used here? I can play with it using fake data.

cpenn-usgs commented 8 years ago

dateExtended_tick.span2.pdf dateOrig_tick.span2.pdf default_tick.span.pdf

Here's a sample of the graphs for comparison, using the default (tick.span not specified), setting it to 2 with the original WY15 end date, and setting it to 2 with the extended end date. I'll paste the code next.

cpenn-usgs commented 8 years ago
### R Code - Ft. Carson time-series plotting
library(smwrGraphs)
library(dataRetrieval)
library(reshape2)
library(dplyr)
library(xlsx)

#Get the data
##Get site numbers and parameters
#siteParms <- read.csv("SiteList_short_TESTING.csv",header=TRUE,colClasses="character",na.strings="")
#siteParms <- melt(siteParms,id.vars="site_no") %>%
#  rename(parm_cd = value)
siteParms <- read.xlsx("Site_parameters_stds_TESTING.xlsx",sheetIndex = 1,startRow = 1,endRow = 35,header = TRUE) %>%
  rename(parm_cd = ParameterCode) %>%
  rename(site_no = SiteNumber)
siteParms$site_no <- as.character(siteParms$site_no)
siteParms$parm_cd <- as.character(siteParms$parm_cd)

###Get NWIS data
qwData <- readNWISqw(siteNumbers = unique(siteParms$site_no),
                   parameterCd = "All",
                   expanded=TRUE)
paramINFO <- readNWISpCode(na.omit(unique(qwData$parm_cd))) %>%
  rename(parm_cd = parameter_cd)
siteINFO <- readNWISsite(unique(qwData$site_no))

qwData <- left_join(qwData,paramINFO,by="parm_cd")
qwData <- left_join(qwData,siteINFO,by="site_no")

###Subset to only wanted data - and only WS medium_cd
qwData <- left_join(siteParms,qwData,by=c("site_no","parm_cd"))
qwData <- subset.data.frame(qwData,qwData$medium_cd=="WS")

sites <- unique(qwData$site_no)

for(i in 1:length(sites))
{
  subData <- qwData[qwData$site_no == sites[i],c("parm_cd","parameter_nm","Standard1","Standard2","site_no","station_nm","sample_dt","result_va","remark_cd")]
  subData <- subData[!is.na(subData$sample_dt),]
  subData <- subData[!is.na(subData$result_va),]
  analytes <- unique(subData$parm_cd)

  ###Min/Max years for plot consistancy between analytes, set time plot/tick span based on years
  minyr <- min(year(subData$sample_dt))
  maxyr <- max(year(subData$sample_dt))
  if(minyr == maxyr)
  {
    xlab <- c("months")
    tspan <- 1
  } else
  {
    xlab <- c("water year")
    #maxyr = maxyr+1
    if ((maxyr-minyr)>10)
    {
      tspan <- 2
    }
    else
      {
      tspan <- 1
      }
  }
  minyr = minyr-1
  mindt <-paste(minyr,"10","01",sep = "-")
  maxdt <-paste(maxyr,"10","01",sep = "-")  

  for(j in 1:length(analytes))
  {

    plotData <- subData[subData$parm_cd %in% analytes[j],]
    std1 <- unique(plotData$Standard1)
    std2 <- unique(plotData$Standard2)
    if(nrow(plotData) > 0)
    {
    setSweave(paste0(unique(plotData$site_no),"_",unique(plotData$parm_cd)), 6 ,3)
    p1<-with(plotData, timePlot(sample_dt, result_va, Plot=list(what="points",symbol="circle",filled=TRUE,size=0.03,color="black"),
                            xaxis.range = as.Date(c(mindt,maxdt)),
                            xlabels = list(major=xlab,tick.span=tspan,style=c("at")),
                            ytitle = unique(parameter_nm),
                            caption=paste0(unique(site_no),": ",unique(station_nm))
                            ))
    addMinorTicks("y",p1)
    addMinorTicks("x",p1,3)
    #Add MCL/numeric standard lines
    if(is.na(std1)==FALSE)
    {
      refLine(horizontal = std1,Plot = list(type = "solid",size = 0.03,color = "black"),current = p1)
      addAnnotation(as.Date(maxdt),std1,annotation = "Numeric standard",justification = "right",size = 5,current = p1)
    } else{}
    if(is.na(std2)==FALSE)
    {
      refLine(horizontal = std2,Plot = list(type = "solid",size = 0.03,color = "black"),current = p1)
      addAnnotation(as.Date(maxdt),std2,annotation = "Numeric standard 2",justification = "right",size = 5,current = p1)
    } else{}
    # Required call to close PDF output graphics
    graphics.off()
    } else {}

    }

}

#move all plots to folder
system("mv *.pdf ./plots")
ldecicco-USGS commented 8 years ago

(side note, I put 3 backticks ` and the letter r before your code, and three backticks after to get it formatted)

ldecicco-USGS commented 8 years ago

Here's a little dataRetrieval trick that could save you a little time if you are making tons of queries:

###Get NWIS data
qwData <- readNWISqw(siteNumbers = unique(siteParms$site_no),
                     parameterCd = "All",
                     expanded=TRUE)
paramINFO <- attr(qwData, "variableInfo")
siteINFO <- attr(qwData, "siteInfo")

qwData <- left_join(qwData,paramINFO,by=c("parm_cd"="parameter_cd"))
qwData2 <- left_join(qwData,siteINFO,by=c("site_no","agency_cd"))
ldecicco-USGS commented 8 years ago

Just so I'm clear (and I probably won't get a chance to look at this today)...are you pointing out that the default is bad (which it is....) and suggesting your fix (which looks pretty nice)?

Or asking a question about the end date?

cpenn-usgs commented 8 years ago

I mainly wanted to bring attention to the issue that even if you hardcode an end date, your data may be cut off if you also input a tick span. I'm running this script on 22 sites, ~2-10 parameters per site, and each site varying from 1-30 years of data. It wasn't immediately obvious to me that data was being cut off for some sites so I figured noting the issue would be good for anyone also running a script that does large data pulls producing lots of plots and may not be intimately familiar with the sites or their data. I was just talking with Joe Mills and he mentioned USGS publishing standards state that the beginning and ending of an axis must have a label, so I guess it does make sense to expand the end date, based on a hard coded end date, to the next possible major tick position, rather than cut it off at the previous tick position. I'm not much of a code developer though, so there's probably a more efficient way to do this in R than my clunky "if" statements!