arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.59k stars 393 forks source link

Visual improvements for forest plot #2083

Open sarinac opened 2 years ago

sarinac commented 2 years ago

Tell us about it

I only have visual critique and less about the content of this chart:

  1. Legend should be listed in correct order. In this example (left in picture below), Centered is plotted before Non-Centered. But the legend lists it in reverse order. I put a mockup on the right, where the legend order is reversed and chart is reformatted (see point 2).

    image
  2. var_names should be the axis title and each parameter should be in their own axes (instead of shared). I think this is cleaner because if you don't specify var_names then you get a really long single chart. It helps to break up the sections into individual axes. Plus, it's easier to notice/distinguish if it's in a separate placement than the school names on the left. With that, you would also have the ability to scale the ax.xlim based on each var_name.

    image
  3. If point 2 works (multiple axes), then legend should go on the very bottom (or very top) for a cleaner look. That way, a single legend can be shared across all axes.

  4. Not sure what the school names are technically called, but can the brackets ("[]") be removed?

canyon289 commented 2 years ago
  1. In the mockup for legends it seems they're in the same order?
  2. For point 2 if were comparing different vars doesn't the title get tin the way? Often times is the case we want to see the differences, like in regression, the magnitude of different coefficients.
  3. No questions here pending answer to above
  4. The brackets are an indication that the var came from a hierarchical parameter. We dont necessarily need to use brackets but we do need to indicate hierarchy somehow
OriolAbril commented 2 years ago

In case it helps provide some extra context @sarinac, here is how I use plot_forest (can't speak for @canyon289 or other arviz members but from his comments I suspect @canyon289 makes a similar use).

90% of the time I use plot_forest as a "more visual version/complement" of summary. I mix scalar and multidimensional variables and default to combined=True because one of the main features in my case with the use I make of the plot is "minimal" use of vertical space.

I like how the examples with the separated axes look like, but I think that would probably make me use the function less, especially in cases with multiple scalar variables.


  1. the legend at the bottom sounds good. I don't think this is affected by using one or multiple axes though. Whatever the case there should only be a single legend.
  2. Yes, the labels can be customized with the labeller argument. There is a guide on how to use those objects at https://python.arviz.org/en/latest/user_guide/label_guide.html. If using multiple axes then the brackets should not be used as in that case, the ylabels correspond to coordinate values only, they can be removed already by using one of these labeller objects, but they are currently needed to distinguish between variable names and coordinates (positions identificators witihn a single but multidimensional variable).

    Note1: plot_forest is one of the only plots (if not the only one) to use the make_label_flat method of the labeller instead of make_label_vert (the vertical one defaults to "variable_name\ncoordinate value", see plot_posterior axes title for example). I believe this special default (that precedes me joining the project) is also an indicator of plot_forest being optimized towards minimal vertical space usage. Ref: https://github.com/arviz-devs/arviz/blob/main/arviz/plots/backends/matplotlib/forestplot.py#L559

    Note2: plot_forest has different defaults for the labeller object depending on the value of legend: https://github.com/arviz-devs/arviz/blob/main/arviz/plots/forestplot.py#L230. If legend=False is used (not the default) then the model information is included in the ylabel of each variable (with a lot of repetition).