GiovineItalia / Gadfly.jl

Crafty statistical graphics for Julia.
http://gadflyjl.org/stable/
Other
1.9k stars 250 forks source link

Histogram issues: bar with height 1 is drawn in wrong place with stacked color, bincount acts confusingly #1472

Open ilyagr opened 4 years ago

ilyagr commented 4 years ago

I'm running Julia 1.5 and Gadfly v1.3.0.

I'm not completely sure this is a bug, but I think the bar chart the should be distinct bars at x=1 and x=2 in the following:

 Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 2, 1]), x="x", color="c", 
         Geom.histogram)

bad

For comparison, without color, it looks better:

    Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 2, 1]), x="x", Geom.histogram)

ok

One possible issue is that the bars are too wide. I tried to adjust for that by changing 2 to 5 and increasing bincount, but that also had an unexpected effect (might also be a bug). Instead of the bars getting narrower, the x axis got uselessly extended to the right.

     Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 5, 1]), x="x",  color="c", 
          Geom.histogram(bincount=10)) 

xaxis

The best workaround I found so far is to abandon histograms and stacking, and use Stat.histogram with point geometry.

Mattriks commented 4 years ago

Try

using Compose # for cx, cy units
df = DataFrame(c=[true, true, false], x=[1, 2, 1])
plot(df, x=:x, color=:c, Geom.histogram, Scale.x_discrete, Theme(bar_spacing=0.5cx))

bar_spacing can be in relative (e.g. 0.1w), absolute (e.g. 5mm), or plot context units (e.g. 0.5cx).

ilyagr commented 4 years ago

Thanks for the suggestion, it helps. Is there any way to make it work with continuous scales? My actual data set is continuous.

Also, something like the following looks wrong -- the bars are ordered as 1, 5, 3.

Gadfly.plot(DataFrame(c = [true, true, false, false], x=[1, 5, 1, 3]), 
         x="x",  color="c", Geom.histogram, Scale.x_discrete,
         Theme(bar_spacing=0.5* Gadfly.Compose.cx))
Mattriks commented 4 years ago

e.g. Scale.x_discrete(levels=[1,3,4,5]) See the Scales section in the Tutorial.

If your scale is really continuous, you can set e.g. Geom.histogram(limits=(min=0, max=5), bincount=5), see histogram examples in the plot gallery.

ilyagr commented 4 years ago

That works, thank you very much! I'm not sure if it'd be easy, but it'd be nice if setting bincount didn't affect the limits, and if the defaults were better.

There is one more bug. On the log scale, bars of height 1 disappear, even if I force the y axis to extend below 1:

Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 3, 1]), x="x",  
  Gadfly.Scale.y_log10(minvalue=0.5), 
  Geom.histogram(minbincount=5, limits=(min=0, max=4)))

logscale

(My actual example has both log scale and colors, so in my mind all of these issues are related, but perhaps that should be a separate bug).

Update: I had the wrong code pasted before (without the minvalue), this is now fixed.

Mattriks commented 4 years ago

What's in your original post isn't a bug, Gadfly is simply choosing automatic bins (which you can manually set as shown above - I'd suggest using bincount, rather than minbincount in Geom.histogram).

The 2nd issue here about using Geom.histogram with Scale.y_log10 is tricky, because a histogram y-axis typically starts from zero. Perhaps try using Scale.y_sqrt instead.

ilyagr commented 4 years ago

Currently, it seems that histograms are hard-coded to bottom out at 1.0 when drawn on a log scale. Perhaps if you just change them to bottom out at 0.8 (and the default scale to start at 0.8), that would be at least a temporary workaround?

It's not quite perfect, as it doesn't help when density = true.

Thank you again for the help.

Mattriks commented 4 years ago

Limits issue noted on discourse: https://discourse.julialang.org/t/unexpected-behaviour-for-custom-histogram-limits-in-gadfly/