USGS-R / wateRuse

Moved to: https://code.usgs.gov/water/water-use/wateruse
https://code.usgs.gov/water/water-use/wateruse
Creative Commons Zero v1.0 Universal
7 stars 11 forks source link

Summary bar charts. #76

Closed dblodgett-usgs closed 8 years ago

dblodgett-usgs commented 8 years ago

@grrmartin-USGS summarized in #46. Will work up a new function to do stacked bar charts. @rwdudley-usgs will help review code and @jshourds-usgs will make sure water use domain details are good to go.

grrmartin-USGS commented 8 years ago

A deluxe version of summary bar charts could look like what Molly used in the last compilation report using clustered bars with the total use overlaid as a line plot:

bargraph_mollycategorytotalscapture

A similar bar chart showing showing groundwater and surface-water totals with total population overlaid was done too:

bargraphgwswtpcapture

The stacked bar charts may be simplest /default first step. The way I did it with category totals for IN pulled the category totals by year from wide to long format using gather_, which ggplot2 read:

reformat the data to year, category, and value

inwucategorytotalsbyyearcapture

categories <- c("PS", "DO", "IN", "TE", "MI", "LS", "IR", "AQ") library(tidyr) test<-gather_(INWU, "TYPE", "VALUE", c(categories), na.rm = FALSE, convert = TRUE, factor_key = FALSE)

testgatheredfilecapture

ggplot(test) + geom_bar(aes_string(x = "YEAR", y = "VALUE", fill = "TYPE"), position = "stack", stat="identity")

ggplot(test) + geom_bar(aes_string(x = "YEAR", y = "VALUE", fill = "TYPE"), position = "dodge", stat="identity")

in_stacked_capture

in_clustercapture

....

Are the calculated totals by use category and GW/SW available for the sample dataset?

If these calculated totals are available by area.column (county, huc, aquifer, State, US), then coding from some of the other functions in wateRuse may be adapted to make these plots at the various levels of aggregation.

ldecicco-USGS commented 8 years ago

There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"

But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs or @twaugens-usgs could weigh in on that.

grrmartin-USGS commented 8 years ago

I don't think I have a dataset with CATEGORYCODE. I have a rough function working that pulls selected totals (PS.WTotl, etc) that I see in wUseSample. Quite a lot of the values are NAs but those are dropped and the others plotted. I get this stacked barchart that is faceted on areas (COUNTYNAME). Code follows. The check indicated no errors.

barchart_sums_stackedcapture

' barchart_sums

'

' barchart_sums

'

' @param data.elements character name of data element within available categories by year for state

' @param years vector of integers specifying range of years to graph. Defaults to NA which shows all years in dataset.

' @param w.use is a subset of the datafile wUseSample that includes all areas in all data elements for state

' @param areas is a geographical area as defined in your datafile such as county, HUC, or aquifer

' @param area.column character that defines which column to use to specify area

' @param y.scale allows R to set the y-axis scale given available data range. Defaults to NA which lets R set the scale based on dataset values.

' @param plot.stack is a logical function to show graph as optionally stacked or clustered bar graph

'

'

' @export

' @import ggplot2

' @importFrom tidyr gather_

'

' @examples

' df <- wUseSample

' areas <- c("New Castle County", "Kent County")

' area.column = "COUNTYNAME"

' data.elements <- c("PS.WTotl","CO.WTotl","DO.WTotl","IN.WTotl","PF.WTotl")

' w.use <- subset_wuse(df, data.elements,area.column,areas)

' year1 <- 2005

' year2 <- 2010

' years <- c(year1, year2)

' barchart_sums(w.use, data.elements, area.column = area.column,areas = areas)

' barchart_sums(w.use, data.elements, plot.stack = FALSE,

' area.column = area.column,areas = areas)

' barchart_sums(w.use, data.elements, area.column)

' barchart_sums(w.use, data.elements, area.column, y.scale = c(0,500))

' barchart_sums(w.use, data.elements, area.column,

' y.scale = c(0,100), years = c(1990,2005))

barchart_sums <- function(w.use, data.elements, area.column, plot.stack=TRUE, years=NA, areas=NA, y.scale=NA){

if(!all(is.na(areas))){ w.use <- w.use[w.use[[area.column]] %in% areas,] }

df2 <- w.use[,c("YEAR",area.column,data.elements)]

df3 <- gather_(df2, "dataElement", "value", c(data.elements))

bc.object <- ggplot(data = df3)

if(plot.stack){ bc.object <- bc.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "stack",stat="identity") } else { bc.object <- bc.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "dodge",stat="identity") }

facet if totals available for multiple areas (counties, etc)

bc.object <- bc.object + facet_grid(COUNTYNAME ~ .) + ylab("")

if(!all(is.na(y.scale))){ bc.object <- bc.object + ylim(y.scale) }

if(!all(is.na(years))){ bc.object <- bc.object + xlim(years) }

bc.object

return(bc.object) }

twaugens-usgs commented 8 years ago

The first two letters are the category code:

PS.WTotl

PS is public supply.

Attached is a master list of category codes with official long names.

On Wed, May 18, 2016 at 5:24 PM, Laura DeCicco notifications@github.com wrote:

There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"

But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs https://github.com/mamaupin-usgs or @twaugens-usgs https://github.com/twaugens-usgs could weigh in on that.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220162820

grrmartin-USGS commented 8 years ago

I'm not seeing an attached list.

On Thu, May 19, 2016 at 7:36 AM, twaugens-usgs notifications@github.com wrote:

The first two letters are the category code:

PS.WTotl

PS is public supply.

Attached is a master list of category codes with official long names.

On Wed, May 18, 2016 at 5:24 PM, Laura DeCicco notifications@github.com wrote:

There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"

But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs https://github.com/mamaupin-usgs or @twaugens-usgs https://github.com/twaugens-usgs could weigh in on that.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220162820

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220299344

twaugens-usgs commented 8 years ago

maybe git is not getting the attachment, here it is in text, its small:

CODE,DURING,NO,NAME,SORTORDER TP,"[1800,NOW]",1,Total Population,10 PS,"[1800,NOW]",2,Public Supply,20 CO,"[1985,NOW]",3,Commercial,35 DO,"[1800,NOW]",4,Domestic,25 IN,"[1985,NOW]",5,Industrial,50 PT,"[1800,NOW]",6,Total Thermoelectric Power,60 PF,"[1985,NOW]",7,Fossil-fuel Thermoelectric Power,70 PG,"[1985,NOW]",8,Geothermal Thermoelectric Power,80 PN,"[1985,NOW]",9,Nuclear Thermoelectric Power,90 PO,"[1985,NOW]",10,Thermoelectric Power (Once-through cooling),100 PC,"[1985,NOW]",11,Thermoelectric Power (Closed-loop cooling),110 MI,"[1985,NOW]",12,Mining,120 LV,"[1985,1999]",13,Total Livestock,130 LS,"[1800,1999]",14,Livestock (Stock),140 LA,"[1985,1999]",15,Livestock (Animal Specialties),150 IT,"[1800,NOW]",16,"Irrigation, Total",160 IR,"[1800,NOW]",17,Irrigation,170 IC,"[1985,NOW]",18,"Irrigation, Crop",180 IG,"[1985,NOW]",19,"Irrigation, Golf Courses",190 HY,"[1800,NOW]",20,Hydroelectric Power,200 WW,"[1985,NOW]",21,Wastewater Treatment,210 RE,"[1985,NOW]",22,Reservoir Evaporation,220 TO,"[1800,NOW]",23,"Totals, Overall",230 TS,"[1800,NOW]",50,"Totals, SW by categories",240 TG,"[1800,NOW]",51,"Totals, GW by categories",250 TW,"[1800,NOW]",52,"Totals, Overall by categories",260 AQ,"[1985,NOW]",53,Aquaculture,155 LI,"[1985,NOW]",54,Livestock,135 OI,"[1800,1984]",55,Other Industrial,55

On Thu, May 19, 2016 at 9:13 AM, grrmartin-USGS notifications@github.com wrote:

I'm not seeing an attached list.

On Thu, May 19, 2016 at 7:36 AM, twaugens-usgs notifications@github.com wrote:

The first two letters are the category code:

PS.WTotl

PS is public supply.

Attached is a master list of category codes with official long names.

On Wed, May 18, 2016 at 5:24 PM, Laura DeCicco <notifications@github.com

wrote:

There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"

But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs https://github.com/mamaupin-usgs or @twaugens-usgs https://github.com/twaugens-usgs could weigh in on that.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220162820

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220299344

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220319987

ldecicco-USGS commented 8 years ago

OK, so if I understand this right, what @grrmartin-USGS wants to plot are unique "dataElements", so this function should be pretty easy to incorporate.

@grrmartin-USGS ....push that code up to your fork (in the "R" folder), and make a pull request! Let's kick the tires on it a bit.

grrmartin-USGS commented 8 years ago

Submitted barchart_sums and boxplot_wu. Sorry should have put boxplots under issue #81.

ldecicco-USGS commented 8 years ago

Added to the shiny app. Play around with it, and re-open this issue if you have any problems. Good job @grrmartin-USGS !

grrmartin-USGS commented 8 years ago

I'm getting a warning/error in shiny: Warning: Error in layoutbase: At least one layer must contain all variables used for facetting. Not sure exactly what that means. The facetting is on areas (county, etc) assuming totals will be calculated by areas within states. The code is working fine in RStudio--it justs omits categories with missing values.

This is a draft set of all the 'totals' in the list--these need to be combined into desired major categories. (There are several categories of thermoelectric, and categories used varied over the years). Exactly what totals will generally be available, where to read them, and how best to combine totals, I'm not clear. In the IN example above, I combined totals to the 8 major categories (PS,DO, IN, TE, MI, LS, IR, AQ). In Molly's report (above), totals were combined to 5 categories ("other" was a category).

I see 3 totals that look like they might have category totals? If these are major category totals, then maybe doing the math in R can be avoided. (A GW/SW chart can be made also.)

TS,"[1800,NOW]",50,"Totals, SW by categories",240 TG,"[1800,NOW]",51,"Totals, GW by categories",250 TW,"[1800,NOW]",52,"Totals, Overall by categories",260

Another question--I'm not clear on how I would get all my data loaded to create the plots. We have annual data back to 1985 in IN--most is 'Non-Compilation' data in between the 5-year national 'Compilation' data. Will Non-Compilation data be in the NWIS dumps?

twaugens-usgs commented 8 years ago

On Mon, May 23, 2016 at 4:15 PM, grrmartin-USGS notifications@github.com wrote:

I'm getting a warning/error in shiny: Warning: Error in layoutbase: At least one layer must contain all variables used for facetting. Not sure exactly what that means. The facetting is on areas (county, etc) assuming totals will be calculated by areas within states. The code is working fine in RStudio--it justs omits categories with missing values.

This is a draft set of all the 'totals' in the list--these need to be combined into desired major categories. (There are several categories of thermoelectric, and categories used varied over the years). Exactly what totals will generally be available, where to read them, and how best to combine totals, I'm not clear. In the IN example above, I combined totals to the 8 major categories (PS,DO, IN, TE, MI, LS, IR, AQ). In Molly's report (above), totals were combined to 5 categories ("other" was a category).

I see 3 totals that look like they might have category totals? If these are major category totals, then maybe doing the math in R can be avoided. (A GW/SW chart can be made also.)

TS,"[1800,NOW]",50,"Totals, SW by categories",240 TG,"[1800,NOW]",51,"Totals, GW by categories",250 TW,"[1800,NOW]",52,"Totals, Overall by categories",260

Another question--I'm not clear on how I would get all my data loaded to create the plots. We have annual data back to 1985 in IN--most is 'Non-Compilation' data in between the 5-year national 'Compilation' data. Will Non-Compilation data be in the NWIS dumps?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221082858

ldecicco-USGS commented 8 years ago

I learned this today, the "Choose Files" I think only works in a browser (I've only tested on IE or Chrome), not just the RStudio shiny window. Not sure if that helps.

I'm not sure I follow the rest of the issue, but here's the code that the shiny app uses:

w.use <- wUseSample
w.use <- caluculate_values(w.use)
data.elements <- c("PS.WTotl","CO.WTotl","DO.WTotl")
area.column <- "STATECOUNTYCODE"
areas <- c("10001","10003")
plot.stack <- TRUE
barchart_sums(w.use, data.elements, area.column = area.column,
              areas = areas,plot.stack=plot.stack)
ldecicco-USGS commented 8 years ago

Notice the function calculate_values is what does the calculations

twaugens-usgs commented 8 years ago

Just wondering since I missed the training.

Is shiny app an application written by OWI or is it an R package ?

On Mon, May 23, 2016 at 4:28 PM, Laura DeCicco notifications@github.com wrote:

Notice the function calculate_values is what does the calculations

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221086225

ldecicco-USGS commented 8 years ago

It's been developed in parallel with the package. There's a function in there that actual runs it. So....if you want to give it a whirl (this may work, if it doesn't let me know):

install.packages(c("devtools","readxl","ggplot2","tidyr","sp","shiny","shinydashboard","DT","data.table","rgeos","dplyr","scales"))

devtools::install_github("USGS-R/wateRuse")

library(wateRuse)

 explore_wateRuse()

If your default browser is IE, copy the address in the browser window and open it in Chrome or Firefox.

Again, let me know if that works or doesn't work.

ldecicco-USGS commented 8 years ago

From the "Choose Files" button on the app, you should be able to either choose 1 dump (.txt) file, a set of Excel files, or a zip file. The one thing I'm still not clear on is the Compilation stuff...

grrmartin-USGS commented 8 years ago

Ideally I’d like to plot the sums for these 8 major categories, for example,

(Public, Domestic, Industrial, Thermoelectric, Mining, Livestock, Aquaculture, Irrigation) PS.WTotl,DO.WTotl,IN.WTotl, PT.WTotl, MI.WTotl, LS.WTotl, AQ.WTotl,IT.WTotl

But often it appears these totals are missing: such as here in DE for PS.WTotl for 1985-90 and 2000: publicsupplytotaldelaware_nascapture

I assume it won't compute because of missing values. It seems many of these 8 totals could be determined if the na's were replaced by zeros, then run the calculations. Is this plausible?

I see the same thing in the 85-90 thermoelectric fossil in HI. PF-WTotl = PF-WGWTo + PF-WSWTo [During 1985-95, PT=PF+PG+PN (FOSSIL+GEOTHERMAL+NUCLEAR), PT-WTotl = PF-WTotl + PG-WTotl + PN-WTotl ]

pf_hi_nascapture

It looks similar for irrigation total: IT-WTotl=IT-WFrTo+ IT-WSaTo

irrigation_nascapture

Am I missing something in the calculation output?

ldecicco-USGS commented 8 years ago

We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.

grrmartin-USGS commented 8 years ago

The replacement with zeros would be in only the data set used to create the bar chart of the totals, if it''s feasible to rerun the calculations on that set. I think in a lot of cases the NA's are zero''s or deemed negligible.

On Wed, May 25, 2016 at 10:30 AM, Laura DeCicco notifications@github.com wrote:

We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221594435

twaugens-usgs commented 8 years ago

NA means that nothing was measured.

AWUDS 3.0 has the ability to treat nulls as zero so that calculations can be preformed.

But the option is only available for the basic tables by category report, BTBC. The export and dump formats do not have that option.

On Wed, May 25, 2016 at 10:30 AM, Laura DeCicco notifications@github.com wrote:

We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221594435

twaugens-usgs commented 8 years ago

NA really means that nothing was measured, usually because a WSC was given an option to not compile the data. AWUDS has mandatory and non mandatory data compilation requirements. A lot of the NA are non mandatory data. The values could be significant if they were collected.

Data elements that are mandatory should not have NA in any of the values. If the value was negligible they were asked to enter zero.

On Wed, May 25, 2016 at 10:50 AM, Augenstein, Todd twaugens@usgs.gov wrote:

NA means that nothing was measured.

AWUDS 3.0 has the ability to treat nulls as zero so that calculations can be preformed.

But the option is only available for the basic tables by category report, BTBC. The export and dump formats do not have that option.

On Wed, May 25, 2016 at 10:30 AM, Laura DeCicco notifications@github.com wrote:

We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221594435

ldecicco-USGS commented 8 years ago

OK, so what I think I'm hearing is just for the stacked bar graph, treat NA's as 0 for the columns: PS.WTotl,DO.WTotl,IN.WTotl, PT.WTotl, MI.WTotl, LS.WTotl, AQ.WTotl,IT.WTotl

Yes?

grrmartin-USGS commented 8 years ago

If possible, in the fields used to calculate these 8 major categories, substitute zeros for NA's, and rerun calculations to update these totals. It should give many more values to include.

(It sounds like from Todd's note that a new version 3 of AWUDS does this also.)

Probably the bar chart of the major (8) categories would be most useful for the state, or huc, or aquifer totals. (The plots for individual areas (counties) are probably less useful.)

On Wed, May 25, 2016 at 11:06 AM, Laura DeCicco notifications@github.com wrote:

OK, so what I think I'm hearing is just for the stacked bar graph, treat NA's as 0 for the columns: PS.WTotl,DO.WTotl,IN.WTotl, PT.WTotl, MI.WTotl, LS.WTotl, AQ.WTotl,IT.WTotl

Yes?

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221605380

ldecicco-USGS commented 8 years ago

So that takes me to the next question. HUC and aquifer totals come in line-by-line in the HUC or aquifer Excel files, but how does one calculate state totals? Sum the counties?

twaugens-usgs commented 8 years ago

The sum of counties for a state should be equal to the sum of huc for a state. There is a QA program in AWUDS which does state totals for county and huc and it compares the values. Values are output to Excel.

On Wed, May 25, 2016 at 11:53 AM, Laura DeCicco notifications@github.com wrote:

So that takes me to the next question. HUC and aquifer totals come in line-by-line in the HUC or aquifer Excel files, but how does one calculate state totals? Sum the counties?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221617486

grrmartin-USGS commented 8 years ago

Yes by summing the county values.

On Wed, May 25, 2016 at 11:53 AM, Laura DeCicco notifications@github.com wrote:

So that takes me to the next question. HUC and aquifer totals come in line-by-line in the HUC or aquifer Excel files, but how does one calculate state totals? Sum the counties?

— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221617486

grrmartin-USGS commented 8 years ago

Noticed a clever approach for stacked bar charts in Bob Smail's WU Open Forum talk--he put the thermoelectric on a separate right-side scale with a point plot so it does not dwarf the other categories. Another approach if such a refinement is wanted later.

wi_barscapture

To plot the groundwater and surface water sums in similar bar charts (like Molly did above), the code would be about the same. (Total population could go on right scale with point plot. ) I looked for these totals in the w.use.new file from the calculation outputs, but all are NA's for DE and HI. gwsw.use.sums<-subset(w.use.new,select=c(TP.TotPop,TO.WGWTo, TO.WSWTo))

ldecicco-USGS commented 8 years ago

Unfortunately, one of the drawbacks of ggplot2 is the inability to make an independent secondary y axis. It's not to bad to do in base R, but you won't be able to easily modify the ggplot2 code.

grrmartin-USGS commented 8 years ago

If the groundwater/surface-water totals by state and huc (aquifer being all GW) are available in the R summations output after zeroing the NA's, the second plot above by Molly could be added. I think the code would be almost the same as for the category totals, except using only TO.WGWTo and TO.WSWTo data elements. Not sure maybe this plot could just be appended below the category totals plots.

ldecicco-USGS commented 8 years ago

Would adding TP.TotPop,TO.WGWTo, TO.WSWTo to the "Totals" options give you what you need?

grrmartin-USGS commented 8 years ago

Yes adding TO.WGWTo and TO.WSWTo data elements to the pick list of category totals can work. Showing these with the other categories might work OK for clustered bar charts, but stacking these together with the category may amount to 'double counting.' We could just tell users to select the GW/SW totals separately for that breakdown. Can you specify dark green for GW and dark blue for SW?

grrmartin-USGS commented 8 years ago

I noticed a 'color-blind friendly palette' listed in R Graphics Cookbook. Maybe you are already using it?

grrmartin-USGS commented 8 years ago

Also I notice the colors change among categories when selecting/deselecting categories. Would there be a way in ggplot2 to specify a constant color for each category, regardless of which combination of categories is selected?

ldecicco-USGS commented 8 years ago

Color blind palette, we can add that. Matching a color to category...that's a little trickier.