JuliaPlots / StatsPlots.jl

Statistical plotting recipes for Plots.jl
Other
441 stars 90 forks source link

How to unsort groupedbar categories #437

Open dmetivie opened 3 years ago

dmetivie commented 3 years ago

I am using groupedbar and I consider a case with a lot of group (namely more than ten), here is the github example with 12 groups:

a = 12
ctg = repeat(["Category 1", "Category 2"], inner = a)
nam = repeat("G" .* string.(1:a), outer = 2)

groupedbar(nam, rand(a, 2), group = ctg, xlabel = "Groups", ylabel = "Scores",
        title = "Scores by group and category", bar_width = 0.67,
        lw = 0, framestyle = :box)

image

As you can see the groups seem to be sorted which produces an unnatural look. How can we change that? In fact, in my real case example, I have months instead of Gs and it does the same thing placing Ap, Aug, Dec, Feb, Jan instead of Jan, Feb, etc.

Is there an option like sort=:false?

eahenle commented 3 years ago

I just encountered this, too. Don't see any such option in the docs or the source. Also don't see where in the code things are being sorted. Would really like to have the ability to disable the alphanumeric sorting of groups.

leejm516 commented 3 years ago

In addition, sorting categories causes inconsistencies between value and error bar.

ctg = repeat(["Acetate", "Ethanol", "BDO"], inner=7)
nam = repeat(["G1","G2","G3","G4","G5","G6","G7"], outer=3)

y_mean = [3*ones(7) 2*ones(7) ones(7)]
y_std = [1.5*ones(7) ones(7) 0.1*ones(7)]

groupedbar(nam, y_mean, group = ctg, yerr=y_std, fmt=:png)

test

We can see that the error bars for BDO went to ethanol, and vice versa.

jmtlawrie commented 2 years ago

Hello everyone - I also recently came across this aspect, but I found a work-around on Julia's Discourse forum which I thought it would be good to share here.


Credit for the workaround goes to @JonasIsensee (https://github.com/JonasIsensee) (I don't seem to be able to tag them directly).

All I have done is modify it to copy the order of arguments which StatsPlots.groupedbar usually accepts.


Please test out the MWE below and let me know if it works for you as well, and for bonus points I would be interested to hear thougts regarding:

  1. Why the "names" on the 'x-axis' are not positioned correctly when using the optional argument bar_position = :stack, (although the sorting still works).
  2. What are the possible consequences of the required redefinition of Base.unique for inputs of type CategoricalArray?
    • Ideally there would be a way of temporarily, locally changing the definition, only within the scope of prepare_groupedbar_inputs!, but I am not sure how to do this.
  3. If you think it would be a good idea to add a note explaining this work-around to the relevant section of StatsPlots.jl's documentation?
  4. If the function could be improved, I'm all ears!

Thanks!


MWE ordered columns

  using CategoricalArrays
  using StatsPlots

  function prepare_groupedbar_inputs!(names::Vector{T1}, data_matrix::Matrix{T2},  group::Vector{T1}) where {T1<:AbstractString, T2<:Real}

      # Redefine unique for `CategoricalArray` types to return a categorical array, rather than a regular vector/array. 
      @eval function Base.unique(ctg::CategoricalArray) # can be run in REPL instead
          l = levels(ctg)
          newctg = CategoricalArray(l)
          levels!(newctg, l)
      end

      @assert size(data_matrix)[1] % length(group) == 0 "The number of rows in the data matrix must be a multiple of the number of data categories."
      @assert size(data_matrix)[2] % length(names) == 0 "The number of column in the data matrix must be a multiple of the number of groups of bars."

      plot_names = repeat(names, outer = size(data_matrix)[1])
      plot_groups = repeat(group, inner = size(data_matrix)[2])

      plot_names = categorical(plot_names; levels = names)
      plot_groups = categorical(plot_groups; levels = group)

      return plot_names, data_matrix, plot_groups

  end

  names = ["Sample 2", "Sample 1", "Sample 3"];
  data_matrix = rand((1,2,3), 4, 3);
  group = ["Red tokens", "Blue tokens", "Green tokens", "Orange tokens"];

  names, data_matrix, group = prepare_groupedbar_inputs!(names, data_matrix, group)

  plot = StatsPlots.groupedbar(names, data_matrix, group=group,
                      title = "Sorted `groupedbar()` plot",
                      xlabel = "Sorted group names", ylabel = "Data values",
                      legendtitle = "Sorted data categories", legend_position = :outerright,
                      fillcolor = [:red :blue :green :orange],
                      # bar_position = :stack, ## ordering works, but names are displaced..?
  )

Something similar to this should be plotted (note the ordering on the x-axis and in the legend matches the order declared for when names and group are defined).

Screenshot 2022-08-31 at 00 00 30


Tested using:

julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: 4 × Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  Threads: 1 on 2 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 

(@v1.8) pkg> st
Status `~/.julia/environments/v1.8/Project.toml`
  [324d7699] CategoricalArrays v0.10.6
  [f3b207a7] StatsPlots v0.15.2

Docstring for this function:

""" prepare_groupedbar_inputs!(names::Vector{T1}, data_matrix::Matrix{T2}, group::Vector{T1}) where {T1<:AbstractString, T2<:Real}

Uses CategoricalArray to prepare input data for StatsPlots.groupedbar function, such that the names and group are sorted according to the order they were passed into prepare_groupedbar_inputs.

Warning: this function redefines Base.unique for inputs of type CategoricalArray, which may cause changes to other functions!

Based on this discourse post → https://discourse.julialang.org/t/statplots-groupedbar-order-x-axis/13912/18?u=jlawrie

See also StatsPlots.groupedbar, Plots.bar.

Example

  julia> names = ["Sample 2", "Sample 1", "Sample 3"];
  julia> data_matrix = rand((1,2,3), 4, 3);
  julia> group = ["Red tokens", "Blue tokens", "Green tokens", "Orange tokens"];

  julia> names, data_matrix, group = prepare_groupedbar_inputs!(names, data_matrix, group)

  julia> plot = StatsPlots.groupedbar(names, data_matrix, group=group,
                      title = "Sorted `groupedbar()` plot",
                      xlabel = "Sorted group names", ylabel = "Data values",
                      legendtitle = "Sorted data categories", legend_position = :outerright,
                      fillcolor = [:red :blue :green :orange],
                      # bar_position = :stack, ## ordering works, but names are displaced..?
         )

"""

pranshumalik14 commented 1 year ago

Any updates on this bug, particularly in the context of @leejm516? I do not see error bars on groupedbars now...

image

jmtlawrie commented 1 year ago

I do not see error bars on groupedbars now...

Hi - I don't think there has been any progress on this issue.

Incase it helps, I just ran the code from that example with the error bars and reproduced the same graph as in the original post.

Thanks! Joe

pranshumalik14 commented 1 year ago

@jmtlawrie I think to figure this issue, you need the groups to be categorically sorted as well (in addition to the names), it will work then.