USGS-R / wateRuse

Moved to: https://code.usgs.gov/water/water-use/wateruse
https://code.usgs.gov/water/water-use/wateruse
Creative Commons Zero v1.0 Universal
7 stars 11 forks source link

Add function to plot multiple data elements for one area over time #25

Closed cadieter-usgs closed 8 years ago

cadieter-usgs commented 8 years ago

Requet additional function to plut multimple data elements for the same area (county) over time. This function is Similar to issue #8 .

jshourds-usgs commented 8 years ago

Okay so I have been playing around with R Studio trying to make this work. I was able to create an example using made up data and came up with this code:

#Year<- c(1980, 1985, 1990, 1995)
# TotPop<- c(100, 200, 300, 400)
# SWPop<- c(50, 100, 150, 200)
# df2<-(Year, TotPop, SWPop)
# View(df2)
# library("reshape2")
# mdf <- melt(df2, id.vars="Year", value.name="value", variable.name="DataElement")
# View(mdf)
 ggplot(data=mdf, aes(x=Year, y=value, group = DataElement, colour = DataElement)) + geom_line() + geom_point( size=4, shape=21, fill="white") + ggtitle("Kent County") 

The next step is to code this up using our data and make it happen that way but baby steps! ;-)

jshourds-usgs commented 8 years ago

It makes this: rplot

dblodgett-usgs commented 8 years ago

@grrmartin-USGS will help out. Note you should use _spread and _gather rather than reshape 'melt' and 'cast'.

grrmartin-USGS commented 8 years ago

This looks promising...playing with the code in the function we have in time_series_data, changing color=COUNTYNAME to color= "dataElement" and in the facet using COUNTYNAME instead of dataElement gives now multiple data elements plotted over time for single counties. (time_series_data gives multiple counties for plotted over time for single data elements. (We almost had this figured out at the end of the 'sprint')

df <- w.use[,c("YEAR",area.column,data.elements)] df <- gather_(df, "dataElement", "value", c(data.elements)) ts.object <- ggplot(data = df)

plot points

default is to include legend

ts.object <- ts.object + geom_point(aes_string(x = "YEAR", y = "value", color = "dataElement")) ts.object <- ts.object + facet_grid(COUNTYNAME ~ .) + ylab("")
ts.object

multipledataelementtimeseriesdcapture

Then the bar graph works also with these substitutions:

ts.object <- ggplot(data = df) ts.object <- ts.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "dodge",stat="identity") ts.object <- ts.object + facet_grid(COUNTYNAME ~ .) + ylab("")
ts.object

multipledataelementtimeseriesecapture

jshourds-usgs commented 8 years ago

@grrmartin-USGS Cool, Gary! Looks great-- :+1: I worked your code up into a function/R_Script and think it's going to work. I might change points into points connected with a line-- is that ok? I'll post the code here in a few.

jshourds-usgs commented 8 years ago
#' multi_element_data
#'
#' multi_element_data
#' 
#' @param data.elements character name of data element within available categories by year for state
#' @param years vector of integers specifying all years available for state. Defaults to NA which shows all years in dataset.
#' @param w.use is a subset of the datafile wUseSample that includes all areas in all data elements for state
#' @param areas is a geographical area as defined in your datafile such as county, HUC, or aquifer
#' @param area.column character that defines which column to use to specify area
#' @param y.scale allows R to set the y-axis scale given available data range. Defaults to NA which lets R set the scale based on dataset values.
#' @param log = TRUE or FALSE allows user to set log scale, default is FALSE
#' @param plot.points is a logical function to show counties as points or clustered bar graph
#' @param legend is a logical function to include list of counties in a legend if manageable, default is TRUE
#'
#' 
#' @export
#' @import ggplot2
#' @importFrom tidyr gather_
#' 
#' @examples 
#' df <- wUseSample
#' areas <- c("Kent County","Sussex County")
#' area.column = "COUNTYNAME"
#' data.elements <- c("PS.GWPop","TP.TotPop")
#' w.use <- subset_wuse(df, data.elements,area.column,areas)
#' year1 <- 2005
#' year2 <- 2010
#' years <- c(year1, year2)
#' multi_element_data(w.use, data.elements, area.column = area.column,areas = areas)
#' multi_element_data(w.use, data.elements, plot.points = FALSE,
#'        area.column = area.column,areas = areas)
#' multi_element_data(w.use, data.elements, plot.points = FALSE,
#'        area.column = area.column,areas = areas, legend=FALSE)
#' multi_element_data(w.use, data.elements, area.column)
#' multi_element_data(w.use, data.elements, area.column, y.scale = c(0,1000))
#' multi_element_data(w.use, data.elements, area.column, 
#'        y.scale = c(0,100), years = c(1990,2005))
multi_element_data <- function(w.use, data.elements, area.column, plot.points = TRUE,
                             years= NA, areas= NA, y.scale=NA, log= FALSE, legend= TRUE){

  if(!all(is.na(areas))){
    w.use <- w.use[w.use[[area.column]] %in% areas,]
  }

  df <- w.use[,c("YEAR",area.column,data.elements)]

  df <- gather_(df, "dataElement", "value", c(data.elements))

  me.object <- ggplot(data = df) 

  if(plot.points){
    me.object <- me.object + geom_point(aes_string(x = "YEAR", y = "value", color = "dataElement"), show.legend = legend)
  } else {
    me.object <- me.object + geom_bar(aes_string(x = "YEAR", y = "value", 
                                                 fill = "dataElement"), 
                                      position = "dodge",stat="identity",show.legend = legend)
  }

  me.object <- me.object + facet_grid(COUNTYNAME ~ .) +
    ylab("") 

  if(!all(is.na(y.scale))){
    me.object <- me.object + ylim(y.scale)
  }

  if(!all(is.na(years))){
    me.object <- me.object + xlim(years)
  }

  if(log){
    me.object <- me.object + scale_y_log10()
  }

  me.object

  return(me.object)
}
grrmartin-USGS commented 8 years ago

Sure a line is fine.

Gary Martin (grmartin@usgs.gov)

Hydrologist

U.S. Geological Survey

Kentucky Water Science Center

9818 Bluegrass Parkway

Louisville, Kentucky

40299-1906

(502) 609-1383 (cell)

(502) 493-1914 (phone)

(502) 493-1909 (fax)

On Tue, May 17, 2016 at 3:23 PM, jshourds-usgs notifications@github.com wrote:

' multi_element_data

'

' multi_element_data

'

' @param data.elements character name of data element within available categories by year for state

' @param years vector of integers specifying all years available for state. Defaults to NA which shows all years in dataset.

' @param w.use is a subset of the datafile wUseSample that includes all areas in all data elements for state

' @param areas is a geographical area as defined in your datafile such as county, HUC, or aquifer

' @param area.column character that defines which column to use to specify area

' @param y.scale allows R to set the y-axis scale given available data range. Defaults to NA which lets R set the scale based on dataset values.

' @param log = TRUE or FALSE allows user to set log scale, default is FALSE

' @param plot.points is a logical function to show counties as points or clustered bar graph

' @param legend is a logical function to include list of counties in a legend if manageable, default is TRUE

'

'

' @export

' @import ggplot2

' @importFrom tidyr gather_

'

' @examples

' df <- wUseSample

' areas <- c("Kent County","Sussex County")

' area.column = "COUNTYNAME"

' data.elements <- c("PS.GWPop","TP.TotPop")

' w.use <- subset_wuse(df, data.elements,area.column,areas)

' year1 <- 2005

' year2 <- 2010

' years <- c(year1, year2)

' multi_element_data(w.use, data.elements, area.column = area.column,areas = areas)

' multi_element_data(w.use, data.elements, plot.points = FALSE,

' area.column = area.column,areas = areas)

' multi_element_data(w.use, data.elements, plot.points = FALSE,

' area.column = area.column,areas = areas, legend=FALSE)

' multi_element_data(w.use, data.elements, area.column)

' multi_element_data(w.use, data.elements, area.column, y.scale = c(0,1000))

' multi_element_data(w.use, data.elements, area.column,

' y.scale = c(0,100), years = c(1990,2005))

multi_element_data <- function(w.use, data.elements, area.column, plot.points = TRUE, years= NA, areas= NA, y.scale=NA, log= FALSE, legend= TRUE){

if(!all(is.na(areas))){ w.use <- w.use[w.use[[area.column]] %in% areas,] }

df <- w.use[,c("YEAR",area.column,data.elements)]

df <- gather_(df, "dataElement", "value", c(data.elements))

me.object <- ggplot(data = df)

if(plot.points){ me.object <- me.object + geom_point(aes_string(x = "YEAR", y = "value", color = "dataElement"), show.legend = legend) } else { me.object <- me.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "dodge",stat="identity",show.legend = legend) }

me.object <- me.object + facet_grid(COUNTYNAME ~ .) + ylab("")

if(!all(is.na(y.scale))){ me.object <- me.object + ylim(y.scale) }

if(!all(is.na(years))){ me.object <- me.object + xlim(years) }

if(log){ me.object <- me.object + scale_y_log10() }

me.object

return(me.object) }

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/25#issuecomment-219825828

grrmartin-USGS commented 8 years ago

🖐

Gary Martin (grmartin@usgs.gov)

Hydrologist

U.S. Geological Survey

Kentucky Water Science Center

9818 Bluegrass Parkway

Louisville, Kentucky

40299-1906

(502) 609-1383 (cell)

(502) 493-1914 (phone)

(502) 493-1909 (fax)

On Tue, May 17, 2016 at 3:49 PM, Martin, Gary grmartin@usgs.gov wrote:

Sure a line is fine.

Gary Martin (grmartin@usgs.gov)

Hydrologist

U.S. Geological Survey

Kentucky Water Science Center

9818 Bluegrass Parkway

Louisville, Kentucky

40299-1906

(502) 609-1383 (cell)

(502) 493-1914 (phone)

(502) 493-1909 (fax)

On Tue, May 17, 2016 at 3:23 PM, jshourds-usgs notifications@github.com wrote:

' multi_element_data

'

' multi_element_data

'

' @param data.elements character name of data element within available categories by year for state

' @param years vector of integers specifying all years available for state. Defaults to NA which shows all years in dataset.

' @param w.use is a subset of the datafile wUseSample that includes all areas in all data elements for state

' @param areas is a geographical area as defined in your datafile such as county, HUC, or aquifer

' @param area.column character that defines which column to use to specify area

' @param y.scale allows R to set the y-axis scale given available data range. Defaults to NA which lets R set the scale based on dataset values.

' @param log = TRUE or FALSE allows user to set log scale, default is FALSE

' @param plot.points is a logical function to show counties as points or clustered bar graph

' @param legend is a logical function to include list of counties in a legend if manageable, default is TRUE

'

'

' @export

' @import ggplot2

' @importFrom tidyr gather_

'

' @examples

' df <- wUseSample

' areas <- c("Kent County","Sussex County")

' area.column = "COUNTYNAME"

' data.elements <- c("PS.GWPop","TP.TotPop")

' w.use <- subset_wuse(df, data.elements,area.column,areas)

' year1 <- 2005

' year2 <- 2010

' years <- c(year1, year2)

' multi_element_data(w.use, data.elements, area.column = area.column,areas = areas)

' multi_element_data(w.use, data.elements, plot.points = FALSE,

' area.column = area.column,areas = areas)

' multi_element_data(w.use, data.elements, plot.points = FALSE,

' area.column = area.column,areas = areas, legend=FALSE)

' multi_element_data(w.use, data.elements, area.column)

' multi_element_data(w.use, data.elements, area.column, y.scale = c(0,1000))

' multi_element_data(w.use, data.elements, area.column,

' y.scale = c(0,100), years = c(1990,2005))

multi_element_data <- function(w.use, data.elements, area.column, plot.points = TRUE, years= NA, areas= NA, y.scale=NA, log= FALSE, legend= TRUE){

if(!all(is.na(areas))){ w.use <- w.use[w.use[[area.column]] %in% areas,] }

df <- w.use[,c("YEAR",area.column,data.elements)]

df <- gather_(df, "dataElement", "value", c(data.elements))

me.object <- ggplot(data = df)

if(plot.points){ me.object <- me.object + geom_point(aes_string(x = "YEAR", y = "value", color = "dataElement"), show.legend = legend) } else { me.object <- me.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "dodge",stat="identity",show.legend = legend) }

me.object <- me.object + facet_grid(COUNTYNAME ~ .) + ylab("")

if(!all(is.na(y.scale))){ me.object <- me.object + ylim(y.scale) }

if(!all(is.na(years))){ me.object <- me.object + xlim(years) }

if(log){ me.object <- me.object + scale_y_log10() }

me.object

return(me.object) }

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/25#issuecomment-219825828

grrmartin-USGS commented 8 years ago

Seems you would likely generally need a legend to distinguish data.elements on same plot. If you omit the legend clause, I think it defaults to include legend.

On Tue, May 17, 2016 at 3:50 PM, Martin, Gary grmartin@usgs.gov wrote:

🖐

Gary Martin (grmartin@usgs.gov)

Hydrologist

U.S. Geological Survey

Kentucky Water Science Center

9818 Bluegrass Parkway

Louisville, Kentucky

40299-1906

(502) 609-1383 (cell)

(502) 493-1914 (phone)

(502) 493-1909 (fax)

On Tue, May 17, 2016 at 3:49 PM, Martin, Gary grmartin@usgs.gov wrote:

Sure a line is fine.

Gary Martin (grmartin@usgs.gov)

Hydrologist

U.S. Geological Survey

Kentucky Water Science Center

9818 Bluegrass Parkway

Louisville, Kentucky

40299-1906

(502) 609-1383 (cell)

(502) 493-1914 (phone)

(502) 493-1909 (fax)

On Tue, May 17, 2016 at 3:23 PM, jshourds-usgs notifications@github.com wrote:

' multi_element_data

'

' multi_element_data

'

' @param data.elements character name of data element within available categories by year for state

' @param years vector of integers specifying all years available for state. Defaults to NA which shows all years in dataset.

' @param w.use is a subset of the datafile wUseSample that includes all areas in all data elements for state

' @param areas is a geographical area as defined in your datafile such as county, HUC, or aquifer

' @param area.column character that defines which column to use to specify area

' @param y.scale allows R to set the y-axis scale given available data range. Defaults to NA which lets R set the scale based on dataset values.

' @param log = TRUE or FALSE allows user to set log scale, default is FALSE

' @param plot.points is a logical function to show counties as points or clustered bar graph

' @param legend is a logical function to include list of counties in a legend if manageable, default is TRUE

'

'

' @export

' @import ggplot2

' @importFrom tidyr gather_

'

' @examples

' df <- wUseSample

' areas <- c("Kent County","Sussex County")

' area.column = "COUNTYNAME"

' data.elements <- c("PS.GWPop","TP.TotPop")

' w.use <- subset_wuse(df, data.elements,area.column,areas)

' year1 <- 2005

' year2 <- 2010

' years <- c(year1, year2)

' multi_element_data(w.use, data.elements, area.column = area.column,areas = areas)

' multi_element_data(w.use, data.elements, plot.points = FALSE,

' area.column = area.column,areas = areas)

' multi_element_data(w.use, data.elements, plot.points = FALSE,

' area.column = area.column,areas = areas, legend=FALSE)

' multi_element_data(w.use, data.elements, area.column)

' multi_element_data(w.use, data.elements, area.column, y.scale = c(0,1000))

' multi_element_data(w.use, data.elements, area.column,

' y.scale = c(0,100), years = c(1990,2005))

multi_element_data <- function(w.use, data.elements, area.column, plot.points = TRUE, years= NA, areas= NA, y.scale=NA, log= FALSE, legend= TRUE){

if(!all(is.na(areas))){ w.use <- w.use[w.use[[area.column]] %in% areas,] }

df <- w.use[,c("YEAR",area.column,data.elements)]

df <- gather_(df, "dataElement", "value", c(data.elements))

me.object <- ggplot(data = df)

if(plot.points){ me.object <- me.object + geom_point(aes_string(x = "YEAR", y = "value", color = "dataElement"), show.legend = legend) } else { me.object <- me.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "dodge",stat="identity",show.legend = legend) }

me.object <- me.object + facet_grid(COUNTYNAME ~ .) + ylab("")

if(!all(is.na(y.scale))){ me.object <- me.object + ylim(y.scale) }

if(!all(is.na(years))){ me.object <- me.object + xlim(years) }

if(log){ me.object <- me.object + scale_y_log10() }

me.object

return(me.object) }

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/25#issuecomment-219825828

ldecicco-USGS commented 8 years ago

@jshourds-usgs ....do you want to put that function in a file in the "R" folder? Then, we could give it a test and maybe I could get @rwdudley-usgs to hook it up to the shiny app

jshourds-usgs commented 8 years ago

@ldecicco-USGS just created a pull request with the function. Couldn't figure out how to add a geom_line() so I left it out...

ldecicco-USGS commented 8 years ago

cool, I'll have a look!

jshourds-usgs commented 8 years ago

@grrmartin-USGS You're right so I got rid of the legend feature

ldecicco-USGS commented 8 years ago

I'm about to merge a pull request that adds in this feature to the shiny app. You guys should all pull from upstream and play around with more data to test.

I added a button to change areas....this means that when you get to this new tab, you might want to majorly reduce the number of areas to view (instead of 88 counties for example, you might just want 3). So, click on/off the boxes you want, and then click the button that says "Click Here to Switch Areas"