JuliaPlots / StatsMakie.jl

Statistical visualizations based on high performance plotting package Makie
Other
48 stars 10 forks source link

Legend for grouped data #76

Open grero opened 5 years ago

grero commented 5 years ago

I am trying to add a legend for grouped data. This example works for simple line plots:

https://simondanisch.github.io/ReferenceImages/gallery//legend_1/index.html

but I am not sure how to modify this to display legends for a plot generated e.g. like this:

using StatsMakie, Makie, Random
x = randn(1000);
g = rand(1:3, 1000);
scene = plot(density, Group(g), x)

grouped_density_test

piever commented 5 years ago

Legend support is one of the big TODOs of StatsMakie. The snippet you link to is actually very helpful. I imagine one should create some helper function in StatsMakie that, after grouping, returns the settings needed for the legend. If you inspect scene[end] you should be able to find the colors that were used:

julia> [scene[end].plots[i].color[] for i in 1:3]
3-element Array{RGB{Float64},1} with eltype ColorTypes.RGB{Float64}:
 RGB{Float64}(0.9019607843137255,0.6235294117647059,0.0)                
 RGB{Float64}(0.33725490196078434,0.7058823529411765,0.9137254901960784)
 RGB{Float64}(0.0,0.6196078431372549,0.45098039215686275)  

So I suspect we need a helper function that looks into this and creates a legend using the colors it finds (and the other attributes that were changed by group). I have a doubt: if I'm grouping by two things, should we have two separate legends or a common one with all the combinations?

asinghvi17 commented 5 years ago

I'd say you should probably have a common legend, seems like the easiest to implement

mkborregaard commented 5 years ago

I'd say two separate, much clearer

asinghvi17 commented 5 years ago

Fair enough - the question is, how that can be implemented with Makie's legend interface.

greimel commented 5 years ago

I played around with this a little bit. I would like to share my attempt for anybody who wants to continue working on this. It works for simple examples, but it is very hacky. I just don't know the internals good enough to improve the code.

First, generate some data.

## grouped cross sectional data
x = [i for i in 1:NG for j in 1:NpG]
y = randn(N)
g1 = ["foo $i" for i in 1:NG1 for j in 1:NpG * NG1]
g2 = ["bar $i" for j in 1:NG1 for i in 1:NG1 for k in 1:NpG]

## grouped time series data
using Random
ts = vec(cumsum(randn(N, 9), dims=1))
g01 = ["foo $i" for i in 1:3 for j in 1:N for k in 1:3]
g02 = ["bar $i" for j in 1:3 for i in 1:3 for k in 1:N]

For simple grouping, the everything works out of the box.

scn1 = scatter(Group(g1), x, y, markersize=0.5)
leg1 = legend(scn1[end].plots, unique((g1)), camera=campixel!, raw=true)
vbox(scn1, leg1)

Auswahl_001

For more than one grouping variable use this hacky function.

_is_group(x) = typeof(x.val) <: Group
_is_xyz(x) = typeof(x.val) <: Array

function grouped_legend!(scene)
  input_args = scene.plots[end].input_args
  ## Get group info
  i_grp = findfirst(_is_group.(input_args))
  grp = input_args[i_grp].val.columns
  ## Get data
  xyz = input_args[collect(_is_xyz.(input_args))]

  # For each grouping variable generate a separate legend
  # and combine it using hbox
  leg = mapreduce(hbox, zip(keys(grp), grp)) do n_grp
    nt = NamedTuple{(n_grp[1],)}((n_grp[2],))
    # Generate plot with just one grouping variable
    # (Case distinction for lines vs scatter)
    plt_type = typeof(scene[end].plots[1])
    if plt_type <: Lines
      scn = lines(Group(nt), xyz...)
    elseif plt_type <: Scatter
      scn = scatter(Group(nt), xyz...)
    else
      @error("TODO: $plt_type not yet covered")
    end
    labels = unique(nt[1])
    # Generate legend
    leg = legend(scn[end].plots, labels, camera=campixel!, raw=true)
  end

  vbox(scene, leg)
end
scn2 = scatter(Group(marker=g1, color=g2), x, y, markersize=0.5)
grouped_legend!(scn2)

Auswahl_002

scene = lines(Group(color=g01, linestyle=g02), ts)
grouped_legend!(scene)

Auswahl_003

Todo

asinghvi17 commented 5 years ago

This raises an interesting question - currently, the legend recipe ingests plot objects natively. Should we make it more generic, so that you can create a legend from something like a LegendEntry struct?

A potential structure for that is here:

struct LegendEntry 
    plottype::Type{<: AbstractPlot} # lines & scatter implemented now - each plottype can overload this.
    label # label text
    padding # some tuple / struct
    attributes::Attributes # everything else
end

then Legend could plot a list of LegendEntries, and we could let argument conversion decompose plots into such a list.

Tokazama commented 4 years ago

I'm just following up on this from a comment on slack. I was concerned with making a legend that actually mapped the correct values to their respective labels.


N = 1000
a = rand((1, 3, 6), N) # a discrete variable
x = randn(N) # a continuous variable
y = @. x + a + 0.8*randn() # a continuous variable
sc = Scene()
scatter!(Group(a), x, y, markersize = 0.2)
lgd = legend(sc.plots[2].plots, string.(unique(sort(a))))
vbox(sc, lgd)

legend_plot

If you don't include the sort function then it maps the values incorrectly. This seems pretty inefficient given this value should be known internally when grouping things. Perhaps the previously referenced bit of code

julia> [scene[end].plots[i].color[] for i in 1:3]
3-element Array{RGB{Float64},1} with eltype ColorTypes.RGB{Float64}:
 RGB{Float64}(0.9019607843137255,0.6235294117647059,0.0)                
 RGB{Float64}(0.33725490196078434,0.7058823529411765,0.9137254901960784)
 RGB{Float64}(0.0,0.6196078431372549,0.45098039215686275)  

Should be a dictionary with the keys corresponding to the group labels.

asinghvi17 commented 4 years ago

Legend isn't integrated at all with StatsMakie, so that makes sense. However, I think this can be solved in AbstractPlotting by enforcing a label attribute on all series.

asinghvi17 commented 4 years ago

Fixed partially by MakieLayout.