GiovineItalia / Gadfly.jl

Crafty statistical graphics for Julia.
http://gadflyjl.org/stable/
Other
1.9k stars 250 forks source link

Geom.histogram confused by Cauchy #574

Open jiahao opened 9 years ago

jiahao commented 9 years ago

It appears that the plotting heuristics get very confused when histogramming Cauchy random variates. The problem appears to be that Geom.histogram tries to use information about the sample mean and variance, whereas the population moments of Cauchy are indefinite.

using Distributions
v = rand(Cauchy(0, 1), 10^6)
plot(x=v, Geom.histogram, Guide.title("μ = $(mean(v)), σ = $(std(v))"))

screen shot 2015-03-31 at 12 24 23 am

Plotting the [normalized] histogram computed with Base.hist yields much nicer results:

xmin = -5; xmax = +5
grid = linspace(xmin, xmax, 100)
_, yh = hist(v, grid)
yh /= length(v)*(grid[2]-grid[1])
plot(Coord.Cartesian(xmin=xmin, xmax=xmax),
    layer(x=grid, y=map(x->pdf(Cauchy(0, 1), x), grid), 
    Geom.line, Theme(default_color=color("darkgrey"), line_width=3px)),
    layer(x=midpoints(grid), y=yh, Geom.bar, Theme(default_color=color("lightgrey"))),
)

screen shot 2015-03-31 at 12 24 28 am

I have to ask if it's worth maintaining your own histogramming code in Gadfly instead of just using Base.hist.

jiahao commented 9 years ago

Obviously this is an unfair comparison, but I suppose the real question is whether it's worth drawing a histogram that captures 100% of the density all the time.