Closed john9631 closed 10 years ago
An extra one. I can't see a bar chart (bar representation of Point Chart). Biologists draw a lot of bar charts :)
They actually need a bar chart that has characteristics of bar and error bar. So you have x, y, ymin and ymax. Why? Apparently bar charts with confidence intervals are popular. Basically you extend a vertical line from ymin to ymax on top of the bar ..
As an extension of that request. I wonder if this CI type Statistic could be built for other chart types as well in the same sort of form as ggplot2's one.
This is quite the issue! Let me see if I have everything:
x
aesthetic optional for boxplots. This is a perfectly reasonable change. I'll do this.Geom.smooth
work Geom.subplot_grid
. I don't think I've tried this. If it doesn't work, it is definitely a bug.Scale.x_discrete
. Does that work better?Lastly, I need to clean up the manual. Referring to using plot
without a data frame as "heretical" was tongue-in-cheek, but people seem to think it's actually bad or wrong. I was really just making fun of myself for previously making plot
too rigid.
Thanks Daniel.
I will try x_discrete. If its good I'll just add it to my document otherwise I'll let you know any issues.
I'm amused by the heretical view ... I will continue extending the Reference card. I am essentially building a complement to the manual that lets "a biologist" pick it up and build some charts. I'm a bit stuck at present because there is a point you reach where dataframes are needed (at least I think they are) and I think I have to go through a structure like:
I'm basing it partly on trying to answer the questions raised by following Hadley's Graphics Cheat Sheet http://had.co.nz/stat480/r/graphics.html
Finally a question: Can I build a chart with 4 lines on it without using a dataframe? At the moment, for a line for each of 4 different bird species say in 4 vectors, I would stack the vectors along with a repeating index and 4 categories in a dataframe and then plot that.
You can make that sort of plot without a data frame, but it would you'd still have to stack the vectors and make a new species vector of the same length as the stacked vectors to bind to color
.
Originally everything was through data frames though, so there may still be places where it breaks down without one. Titles for color keys was one that you pointed out.
Thats cool. I will take them in the direction of dfs but simplify it so that it doesn't sound like something "difficult." Something I don't understand fully is the relationship between Winston and Gadfly. Crudely oversimplifying I had thought that Winston was like "Pylab for Julia" and Gadfly was "ggplot2 for Julia" but I read a post about changes to Winston's syntax and its not clear to me. Can you clarify that for me. Feel free to email me at john . lynch at iname . com
I tested number 5 again. x_discrete made no difference. Nor did y_discrete. It would be nice to have an option where the tails of the boxplot were no longer than the actual range.
Oh, I completely misread your original comment!
When you mentioned ranges, I thought you were referring to the first histogram plot you showed. It turns out the fences on boxplots were being incorrectly computed. That's fixed now. I think you should also be able to draw boxplots omitting the x
aesthetic.
Thanks Daniel.
Titles of color keys can now be explicitly set like:
plot(..., Guide.colorkey("Color Key Title"))
Subplot grid titles can be set with Guide.xlabel
and Guide.ylabel
.
I fixed a bug with Geom.smooth
and Geom.subplot_grid
, so those two should play nice now.
Error bars should work correctly with bar plots, if used explicitly like so:
using RDatasets, DataFrames, Gadfly
df = subset(data("plm", "Cigar"), :(state .== 1))
# silly fake error bars intervals
ymin = df["sales"] .- 20*rand()
ymax = df["sales"] .+ 20*rand()
plot(df, x="year", y="sales", ymin=ymin, ymax=ymax,
Geom.bar, Geom.errorbar)
I'm open to finding ways to make that easier, but I don't want it to be automatic. Estimating confidence or credible intervals typically involves some pretty big assumptions about the data. I don't want to make those assumptions for people.
Thanks for the thorough testing. I'll be happy to fix anything else you find.
All going well up to bar including error bar. Please excuse the test of "confidence interval", I know the distribution issues but I need to show biologists what they could do if they wanted :)
The bar plot isnt drawing correctly for me. Here is a simple example plotting a line with it drawn on top by hand
zz = DataFrame(ix=1:10, y=1:10) plot(zz, x="ix", y="y", Geom.bar)
has the same affect. Have I got an earlier gadfly (0.1.20)?
Pkg.status()
Warning: using Base.Stat in module Stat conflicts with an existing identifier. Required packages:
I forgot to tag a new version. After updating, you should be at 0.1.21 now.
Got it thanks.
Sorry, I forgot that I hadnt tested smooth.
Its ok; I had a glitch but I can no longer reproduce it so it was probably a finger error.
I've been working on that draft and found a couple more issues.
Thanks.
The bar and error bar combine perfectly now. One thing that appears out of place is the AAABBB heading the color key. I assume Guide.colorkey just needs a "" default.
Hopefully the last one. With the new bar plot, when coloration by categorical (either number or string) is introduced then some bars are dramatically extended. Exactly the same with vector or dataframe. This is a link to the data used.
Subplotting the barcharts works fine but if color = xgroup the same problem as shown in the picture occurs.
Is there a pie chart option?
Yeah, that looks pretty broken. I'll see what's going on.
There aren't pie charts yet. I'll add them eventually, but stacked bar charts normalized to 100% are often more readable, and easier to add at this point, so I'm going to do that first.
I'll look forward to testing them and adding them to the reference and the tutorial.
Low priority ones.
LP1. In preparation for stacked bar charts I was looking at the others. Is this the behaviour you expect here. Adding color made no difference.:
LP2. With boxplots I think a maximum width should be set (maybe 1.5x the span of the cross bars at high and low)
LP3. For standard line or point plots (others??) x could usefully default to 1:length(y) so that users don't have to figure it out.
plot(x=1:50, y=d_age) ===> plot( y=d_age)
LP4. I'm still getting min/max warnings if they're in your code. plot(x=1:size(d_age,1), y=d_age, Guide.xlabel("Respondent"), Guide.ylabel("Age"), Geom.errorbar, ymin=d_age-1.96_std(d_age), ymax=d_age+1.96_std(d_age), color=collect(d_sex), Guide.colorkey("Sex"), Geom.smooth, Geom.point)
generates: WARNING: min(x) is deprecated, use minimum(x) instead.
The histogram selected its bin sizes poorly which made me recall that you are typically encouraged to look at a number of either binwidths or bins settings to get a clearer perception of your data. Also Hadleys paper calls for that Can we have such a setting for histograms please? My personal preference is to specify bins.
You can now manually set the number of bins, or put an upper or lower limit on the number of bins automatically selected. See the list of arguments here.
When it plots the error bars on points it used zero as the lower bound leaving a bit of spare white space. I can move the chart up and down with my cursor ... can I change it zoom?
To set the viewport manually you can now do something like this.
plot(x=rand(10), y=rand(10),
Scale.x_continuous(minvalue=-1, maxvalue=1),
Scale.y_continuous(minvalue=-5, maxvalue=5))
That's in the manual now as well under the scales section.
Hopefully the last one. With the new bar plot, when coloration by categorical (either number or string) is introduced then some bars are dramatically extended. Exactly the same with vector or dataframe. This is a link to the data used.
The problem here is that Geom.bar
assumes the data is already summarized. Since there are multiple rows in your data with the same age, these bars get stacked on top of each other, hence the extended bars. That's pretty weird and counter-intuitive, but I need to figure out what the right thing to do is.
In the mean time, you can get better results by using Geom.histogram
and adding Scale.discrete_color
.
Ok. That makes sense. Box assumes that there is one y value for each x. This also causes an issue when you apply error bars (very nice by the way, and adjusting theme for zero width lets you match the style used in some articles) as you overlay one bar per point.
Maybe the solution is that the data has to be corrected first ... and the issue should be left exposed to remind the user that their data is richer than the method they're choosing. Sometimes an average would be right, other times a min or a max.
There aren't pie charts yet. I'll add them eventually, but stacked bar charts normalized to 100% are often more readable, and easier to add at this point, so I'm going to do that first.
Suggestion: don't implement pie charts. See http://www.perceptualedge.com/articles/08-21-07.pdf
I'm quite sympathetic to the idea of banning pie charts; I agree that bars are better for most purposes.
However, a mildly-interesting counterpoint: recently I had a referee specifically request a pie chart. Sometimes arguing is not worth the trouble it could cause, and it's better to just give them what they're asking for.
If you want broad adoption you don't want to be the one persuading the customer that Beta is better than VHS. A specialist might buy the argument but your biologist will just go "what, no bar charts?"
If you want broad adoption you don't want to be the one persuading the customer that Beta is better than VHS. A specialist might buy the argument but your biologist will just go "what, no bar charts?"
I assume you meant "pie charts".
I work in a lab with biologists, and I'm forever attempting to get them to remove pie charts from their presentations... I've made progress, but there are some holdouts... ;-)
Yes. You assume right - its the old "the customer may not be right; but he is the king" problem. ----- Original Message ----- From: Kevin Squire Sent: 11/19/13 11:13 AM To: dcjones/Gadfly.jl Subject: Re: [Gadfly.jl] Enhancements for biologists (#103)
If you want broad adoption you don't want to be the one persuading the customer that Beta is better than VHS. A specialist might buy the argument but your biologist will just go "what, no bar charts?"
I assume you meant "pie charts".
I work in a lab with biologists, and I'm forever attempting to get them to remove pie charts from their presentations... I've made progress, but there are some holdouts... ;-) — Reply to this email directly or view it on GitHub https://github.com/dcjones/Gadfly.jl/issues/103#issuecomment-28757018 .
Daniel, After Muraveills comments on the Plotting Thread I was going through the things a biologist might want to do in the heretical world. I discovered some things that are needed to make Gadfly work well for them.
The data I used is rubbish but its at http://dropcanvas.com/gyk14 along with the notebook and my .png and .pdf outputs.