Closed dblodgett-usgs closed 8 years ago
A deluxe version of summary bar charts could look like what Molly used in the last compilation report using clustered bars with the total use overlaid as a line plot:
A similar bar chart showing showing groundwater and surface-water totals with total population overlaid was done too:
The stacked bar charts may be simplest /default first step. The way I did it with category totals for IN pulled the category totals by year from wide to long format using gather_, which ggplot2 read:
categories <- c("PS", "DO", "IN", "TE", "MI", "LS", "IR", "AQ") library(tidyr) test<-gather_(INWU, "TYPE", "VALUE", c(categories), na.rm = FALSE, convert = TRUE, factor_key = FALSE)
ggplot(test) + geom_bar(aes_string(x = "YEAR", y = "VALUE", fill = "TYPE"), position = "stack", stat="identity")
ggplot(test) + geom_bar(aes_string(x = "YEAR", y = "VALUE", fill = "TYPE"), position = "dodge", stat="identity")
....
Are the calculated totals by use category and GW/SW available for the sample dataset?
If these calculated totals are available by area.column (county, huc, aquifer, State, US), then coding from some of the other functions in wateRuse may be adapted to make these plots at the various levels of aggregation.
There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"
But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs or @twaugens-usgs could weigh in on that.
I don't think I have a dataset with CATEGORYCODE. I have a rough function working that pulls selected totals (PS.WTotl, etc) that I see in wUseSample. Quite a lot of the values are NAs but those are dropped and the others plotted. I get this stacked barchart that is faceted on areas (COUNTYNAME). Code follows. The check indicated no errors.
barchart_sums <- function(w.use, data.elements, area.column, plot.stack=TRUE, years=NA, areas=NA, y.scale=NA){
if(!all(is.na(areas))){ w.use <- w.use[w.use[[area.column]] %in% areas,] }
df2 <- w.use[,c("YEAR",area.column,data.elements)]
df3 <- gather_(df2, "dataElement", "value", c(data.elements))
bc.object <- ggplot(data = df3)
if(plot.stack){ bc.object <- bc.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "stack",stat="identity") } else { bc.object <- bc.object + geom_bar(aes_string(x = "YEAR", y = "value", fill = "dataElement"), position = "dodge",stat="identity") }
bc.object <- bc.object + facet_grid(COUNTYNAME ~ .) + ylab("")
if(!all(is.na(y.scale))){ bc.object <- bc.object + ylim(y.scale) }
if(!all(is.na(years))){ bc.object <- bc.object + xlim(years) }
bc.object
return(bc.object) }
The first two letters are the category code:
PS.WTotl
PS is public supply.
Attached is a master list of category codes with official long names.
On Wed, May 18, 2016 at 5:24 PM, Laura DeCicco notifications@github.com wrote:
There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"
But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs https://github.com/mamaupin-usgs or @twaugens-usgs https://github.com/twaugens-usgs could weigh in on that.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220162820
I'm not seeing an attached list.
On Thu, May 19, 2016 at 7:36 AM, twaugens-usgs notifications@github.com wrote:
The first two letters are the category code:
PS.WTotl
PS is public supply.
Attached is a master list of category codes with official long names.
On Wed, May 18, 2016 at 5:24 PM, Laura DeCicco notifications@github.com wrote:
There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"
But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs https://github.com/mamaupin-usgs or @twaugens-usgs https://github.com/twaugens-usgs could weigh in on that.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220162820
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220299344
maybe git is not getting the attachment, here it is in text, its small:
CODE,DURING,NO,NAME,SORTORDER TP,"[1800,NOW]",1,Total Population,10 PS,"[1800,NOW]",2,Public Supply,20 CO,"[1985,NOW]",3,Commercial,35 DO,"[1800,NOW]",4,Domestic,25 IN,"[1985,NOW]",5,Industrial,50 PT,"[1800,NOW]",6,Total Thermoelectric Power,60 PF,"[1985,NOW]",7,Fossil-fuel Thermoelectric Power,70 PG,"[1985,NOW]",8,Geothermal Thermoelectric Power,80 PN,"[1985,NOW]",9,Nuclear Thermoelectric Power,90 PO,"[1985,NOW]",10,Thermoelectric Power (Once-through cooling),100 PC,"[1985,NOW]",11,Thermoelectric Power (Closed-loop cooling),110 MI,"[1985,NOW]",12,Mining,120 LV,"[1985,1999]",13,Total Livestock,130 LS,"[1800,1999]",14,Livestock (Stock),140 LA,"[1985,1999]",15,Livestock (Animal Specialties),150 IT,"[1800,NOW]",16,"Irrigation, Total",160 IR,"[1800,NOW]",17,Irrigation,170 IC,"[1985,NOW]",18,"Irrigation, Crop",180 IG,"[1985,NOW]",19,"Irrigation, Golf Courses",190 HY,"[1800,NOW]",20,Hydroelectric Power,200 WW,"[1985,NOW]",21,Wastewater Treatment,210 RE,"[1985,NOW]",22,Reservoir Evaporation,220 TO,"[1800,NOW]",23,"Totals, Overall",230 TS,"[1800,NOW]",50,"Totals, SW by categories",240 TG,"[1800,NOW]",51,"Totals, GW by categories",250 TW,"[1800,NOW]",52,"Totals, Overall by categories",260 AQ,"[1985,NOW]",53,Aquaculture,155 LI,"[1985,NOW]",54,Livestock,135 OI,"[1800,1984]",55,Other Industrial,55
On Thu, May 19, 2016 at 9:13 AM, grrmartin-USGS notifications@github.com wrote:
I'm not seeing an attached list.
On Thu, May 19, 2016 at 7:36 AM, twaugens-usgs notifications@github.com wrote:
The first two letters are the category code:
PS.WTotl
PS is public supply.
Attached is a master list of category codes with official long names.
On Wed, May 18, 2016 at 5:24 PM, Laura DeCicco <notifications@github.com
wrote:
There are some calculated values...like: deliveries to thermoelectric, in Mgal/d (PS-DelPT) is equal to "PF-PSDel + PG-PSDel + PN-PSDel"
But it looks like you want to plot by "CATEGORYCODE"...I don't know how to calculate those officially (just sum?). Maybe @mamaupin-usgs https://github.com/mamaupin-usgs or @twaugens-usgs https://github.com/twaugens-usgs could weigh in on that.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220162820
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220299344
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-220319987
OK, so if I understand this right, what @grrmartin-USGS wants to plot are unique "dataElements", so this function should be pretty easy to incorporate.
@grrmartin-USGS ....push that code up to your fork (in the "R" folder), and make a pull request! Let's kick the tires on it a bit.
Submitted barchart_sums and boxplot_wu. Sorry should have put boxplots under issue #81.
Added to the shiny app. Play around with it, and re-open this issue if you have any problems. Good job @grrmartin-USGS !
I'm getting a warning/error in shiny: Warning: Error in layoutbase: At least one layer must contain all variables used for facetting. Not sure exactly what that means. The facetting is on areas (county, etc) assuming totals will be calculated by areas within states. The code is working fine in RStudio--it justs omits categories with missing values.
This is a draft set of all the 'totals' in the list--these need to be combined into desired major categories. (There are several categories of thermoelectric, and categories used varied over the years). Exactly what totals will generally be available, where to read them, and how best to combine totals, I'm not clear. In the IN example above, I combined totals to the 8 major categories (PS,DO, IN, TE, MI, LS, IR, AQ). In Molly's report (above), totals were combined to 5 categories ("other" was a category).
I see 3 totals that look like they might have category totals? If these are major category totals, then maybe doing the math in R can be avoided. (A GW/SW chart can be made also.)
TS,"[1800,NOW]",50,"Totals, SW by categories",240 TG,"[1800,NOW]",51,"Totals, GW by categories",250 TW,"[1800,NOW]",52,"Totals, Overall by categories",260
Another question--I'm not clear on how I would get all my data loaded to create the plots. We have annual data back to 1985 in IN--most is 'Non-Compilation' data in between the 5-year national 'Compilation' data. Will Non-Compilation data be in the NWIS dumps?
dump File names that begin with the word 'best' are "best available" datasets for earlier compilations plus the datasets named "Compilation" for the current compilation.
These only contain data for compilation years.
On Mon, May 23, 2016 at 4:15 PM, grrmartin-USGS notifications@github.com wrote:
I'm getting a warning/error in shiny: Warning: Error in layoutbase: At least one layer must contain all variables used for facetting. Not sure exactly what that means. The facetting is on areas (county, etc) assuming totals will be calculated by areas within states. The code is working fine in RStudio--it justs omits categories with missing values.
This is a draft set of all the 'totals' in the list--these need to be combined into desired major categories. (There are several categories of thermoelectric, and categories used varied over the years). Exactly what totals will generally be available, where to read them, and how best to combine totals, I'm not clear. In the IN example above, I combined totals to the 8 major categories (PS,DO, IN, TE, MI, LS, IR, AQ). In Molly's report (above), totals were combined to 5 categories ("other" was a category).
I see 3 totals that look like they might have category totals? If these are major category totals, then maybe doing the math in R can be avoided. (A GW/SW chart can be made also.)
TS,"[1800,NOW]",50,"Totals, SW by categories",240 TG,"[1800,NOW]",51,"Totals, GW by categories",250 TW,"[1800,NOW]",52,"Totals, Overall by categories",260
Another question--I'm not clear on how I would get all my data loaded to create the plots. We have annual data back to 1985 in IN--most is 'Non-Compilation' data in between the 5-year national 'Compilation' data. Will Non-Compilation data be in the NWIS dumps?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221082858
I learned this today, the "Choose Files" I think only works in a browser (I've only tested on IE or Chrome), not just the RStudio shiny window. Not sure if that helps.
I'm not sure I follow the rest of the issue, but here's the code that the shiny app uses:
w.use <- wUseSample
w.use <- caluculate_values(w.use)
data.elements <- c("PS.WTotl","CO.WTotl","DO.WTotl")
area.column <- "STATECOUNTYCODE"
areas <- c("10001","10003")
plot.stack <- TRUE
barchart_sums(w.use, data.elements, area.column = area.column,
areas = areas,plot.stack=plot.stack)
Notice the function calculate_values
is what does the calculations
Just wondering since I missed the training.
Is shiny app an application written by OWI or is it an R package ?
On Mon, May 23, 2016 at 4:28 PM, Laura DeCicco notifications@github.com wrote:
Notice the function calculate_values is what does the calculations
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221086225
It's been developed in parallel with the package. There's a function in there that actual runs it. So....if you want to give it a whirl (this may work, if it doesn't let me know):
install.packages(c("devtools","readxl","ggplot2","tidyr","sp","shiny","shinydashboard","DT","data.table","rgeos","dplyr","scales"))
devtools::install_github("USGS-R/wateRuse")
library(wateRuse)
explore_wateRuse()
If your default browser is IE, copy the address in the browser window and open it in Chrome or Firefox.
Again, let me know if that works or doesn't work.
From the "Choose Files" button on the app, you should be able to either choose 1 dump (.txt) file, a set of Excel files, or a zip file. The one thing I'm still not clear on is the Compilation stuff...
Ideally I’d like to plot the sums for these 8 major categories, for example,
(Public, Domestic, Industrial, Thermoelectric, Mining, Livestock, Aquaculture, Irrigation) PS.WTotl,DO.WTotl,IN.WTotl, PT.WTotl, MI.WTotl, LS.WTotl, AQ.WTotl,IT.WTotl
But often it appears these totals are missing: such as here in DE for PS.WTotl for 1985-90 and 2000:
I assume it won't compute because of missing values. It seems many of these 8 totals could be determined if the na's were replaced by zeros, then run the calculations. Is this plausible?
I see the same thing in the 85-90 thermoelectric fossil in HI. PF-WTotl = PF-WGWTo + PF-WSWTo [During 1985-95, PT=PF+PG+PN (FOSSIL+GEOTHERMAL+NUCLEAR), PT-WTotl = PF-WTotl + PG-WTotl + PN-WTotl ]
It looks similar for irrigation total: IT-WTotl=IT-WFrTo+ IT-WSaTo
Am I missing something in the calculation output?
We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.
The replacement with zeros would be in only the data set used to create the bar chart of the totals, if it''s feasible to rerun the calculations on that set. I think in a lot of cases the NA's are zero''s or deemed negligible.
On Wed, May 25, 2016 at 10:30 AM, Laura DeCicco notifications@github.com wrote:
We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.
— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221594435
NA means that nothing was measured.
AWUDS 3.0 has the ability to treat nulls as zero so that calculations can be preformed.
But the option is only available for the basic tables by category report, BTBC. The export and dump formats do not have that option.
On Wed, May 25, 2016 at 10:30 AM, Laura DeCicco notifications@github.com wrote:
We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221594435
NA really means that nothing was measured, usually because a WSC was given an option to not compile the data. AWUDS has mandatory and non mandatory data compilation requirements. A lot of the NA are non mandatory data. The values could be significant if they were collected.
Data elements that are mandatory should not have NA in any of the values. If the value was negligible they were asked to enter zero.
On Wed, May 25, 2016 at 10:50 AM, Augenstein, Todd twaugens@usgs.gov wrote:
NA means that nothing was measured.
AWUDS 3.0 has the ability to treat nulls as zero so that calculations can be preformed.
But the option is only available for the basic tables by category report, BTBC. The export and dump formats do not have that option.
On Wed, May 25, 2016 at 10:30 AM, Laura DeCicco notifications@github.com wrote:
We could convert the NA's to 0....I guess the question I would have is should we convert the NA to 0. I assume NA means that nothing was measured...so the number could very well be significant, but we just don't know.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221594435
OK, so what I think I'm hearing is just for the stacked bar graph, treat NA's as 0 for the columns: PS.WTotl,DO.WTotl,IN.WTotl, PT.WTotl, MI.WTotl, LS.WTotl, AQ.WTotl,IT.WTotl
Yes?
If possible, in the fields used to calculate these 8 major categories, substitute zeros for NA's, and rerun calculations to update these totals. It should give many more values to include.
(It sounds like from Todd's note that a new version 3 of AWUDS does this also.)
Probably the bar chart of the major (8) categories would be most useful for the state, or huc, or aquifer totals. (The plots for individual areas (counties) are probably less useful.)
On Wed, May 25, 2016 at 11:06 AM, Laura DeCicco notifications@github.com wrote:
OK, so what I think I'm hearing is just for the stacked bar graph, treat NA's as 0 for the columns: PS.WTotl,DO.WTotl,IN.WTotl, PT.WTotl, MI.WTotl, LS.WTotl, AQ.WTotl,IT.WTotl
Yes?
— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221605380
So that takes me to the next question. HUC and aquifer totals come in line-by-line in the HUC or aquifer Excel files, but how does one calculate state totals? Sum the counties?
The sum of counties for a state should be equal to the sum of huc for a state. There is a QA program in AWUDS which does state totals for county and huc and it compares the values. Values are output to Excel.
On Wed, May 25, 2016 at 11:53 AM, Laura DeCicco notifications@github.com wrote:
So that takes me to the next question. HUC and aquifer totals come in line-by-line in the HUC or aquifer Excel files, but how does one calculate state totals? Sum the counties?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221617486
Yes by summing the county values.
On Wed, May 25, 2016 at 11:53 AM, Laura DeCicco notifications@github.com wrote:
So that takes me to the next question. HUC and aquifer totals come in line-by-line in the HUC or aquifer Excel files, but how does one calculate state totals? Sum the counties?
— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/USGS-R/wateRuse/issues/76#issuecomment-221617486
Noticed a clever approach for stacked bar charts in Bob Smail's WU Open Forum talk--he put the thermoelectric on a separate right-side scale with a point plot so it does not dwarf the other categories. Another approach if such a refinement is wanted later.
To plot the groundwater and surface water sums in similar bar charts (like Molly did above), the code would be about the same. (Total population could go on right scale with point plot. ) I looked for these totals in the w.use.new file from the calculation outputs, but all are NA's for DE and HI. gwsw.use.sums<-subset(w.use.new,select=c(TP.TotPop,TO.WGWTo, TO.WSWTo))
Unfortunately, one of the drawbacks of ggplot2 is the inability to make an independent secondary y axis. It's not to bad to do in base R, but you won't be able to easily modify the ggplot2 code.
If the groundwater/surface-water totals by state and huc (aquifer being all GW) are available in the R summations output after zeroing the NA's, the second plot above by Molly could be added. I think the code would be almost the same as for the category totals, except using only TO.WGWTo and TO.WSWTo data elements. Not sure maybe this plot could just be appended below the category totals plots.
Would adding TP.TotPop,TO.WGWTo, TO.WSWTo to the "Totals" options give you what you need?
Yes adding TO.WGWTo and TO.WSWTo data elements to the pick list of category totals can work. Showing these with the other categories might work OK for clustered bar charts, but stacking these together with the category may amount to 'double counting.' We could just tell users to select the GW/SW totals separately for that breakdown. Can you specify dark green for GW and dark blue for SW?
I noticed a 'color-blind friendly palette' listed in R Graphics Cookbook. Maybe you are already using it?
Also I notice the colors change among categories when selecting/deselecting categories. Would there be a way in ggplot2 to specify a constant color for each category, regardless of which combination of categories is selected?
Color blind palette, we can add that. Matching a color to category...that's a little trickier.
@grrmartin-USGS summarized in #46. Will work up a new function to do stacked bar charts. @rwdudley-usgs will help review code and @jshourds-usgs will make sure water use domain details are good to go.