Add plot_forest and tests

OriolAbril commented 5 months ago

I have started with some improvements to PlotCollection, with the eventual goal to create plot_forest that supports multiple models too.

I will also add tests to ensure further modifications don't end up in unintended regressions.

📚 Documentation preview 📚: https://arviz-plots--34.org.readthedocs.build/en/34/

codecov-commenter commented 5 months ago

Codecov Report

Attention: Patch coverage is 89.92248% with 52 lines in your changes are missing coverage. Please review.

Project coverage is 76.34%. Comparing base (0f5a0db) to head (9492812).

Files	Patch %	Lines
src/arviz_plots/plots/forestplot.py	92.12%	13 Missing :warning:
src/arviz_plots/backend/bokeh/__init__.py	83.87%	10 Missing :warning:
src/arviz_plots/backend/matplotlib/__init__.py	82.50%	7 Missing :warning:
src/arviz_plots/plot_collection.py	93.51%	7 Missing :warning:
src/arviz_plots/visuals/__init__.py	88.00%	6 Missing :warning:
src/arviz_plots/backend/__init__.py	78.94%	4 Missing :warning:
src/arviz_plots/plots/utils.py	88.88%	3 Missing :warning:
src/arviz_plots/plots/distplot.py	91.66%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #34 +/- ## =========================================== + Coverage 61.29% 76.34% +15.05% =========================================== Files 14 15 +1 Lines 801 1260 +459 =========================================== + Hits 491 962 +471 + Misses 310 298 -12 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aloctavodia commented 5 months ago

I love how this looks, very clean! Below, a few tests

This works, but the color for mu and tau (is the same as the first school). Not sure if this is some we should "fix", and not sure what the fix should be. Assign black if the variable does not have the coordinate????

pc = azp.plot_forest(
     non_centered,
     combined=True,
     var_names=["theta", "mu", "theta_t", "tau"],
     pc_kwargs={
         "aes": {"color": ["school"]},
     },
     shade_label="school",
)

This fails

pc = azp.plot_forest(
     non_centered,
     combined=True,
     var_names=["theta", "mu", "theta_t", "tau"],
     pc_kwargs={
         "aes": {"color": ["school"]},
         "plot_grid_kws": {"width_ratios": [1, 2], "layout": "none"}
     },
     aes_map={"labels": ["color"]}, # works if this line is commented out
     shade_label="school",
)

The last part of the error is

File [~/anaconda3/envs/aplots/lib/python3.11/site-packages/matplotlib/colors.py:246], in _check_color_like(**kwargs)
    [244] for k, v in kwargs.items():
    [245]     if not is_color_like(v):
--> [246]         raise ValueError(f"{v!r} is not a valid value for {k}")

ValueError: array(['#40a96d', '#a2157f', '#f62907', '#e4bf2b', '#32d2f9', '#9140f1',
       '#000000', '#6f6f6f'], dtype='<U7') is not a valid value for color

This is weird, the chains are still there...

pc = azp.plot_forest(
     non_centered,
     combined=True,
     var_names=["theta", "mu", "theta_t", "tau"],
     pc_kwargs={
         "aes": {"color": ["chain"]},
     },
     shade_label="__variable__",
)

output

OriolAbril commented 5 months ago

This works, but the color for mu and tau (is the same as the first school). Not sure if this is some we should "fix", and not sure what the fix should be. Assign black if the variable does not have the coordinate????

This is part of what I want to discuss in https://github.com/arviz-devs/arviz-plots/issues/33. Currently any subset (including empty subset) of the dims provided in aes dictionary gets as many values as needed from the given aesthetic. It could definitely be changed once we agree on a behaviour, but it is not particular to plot_forest and won't require any change to it.

This fails

aes_map={"labels": ["color"]}, # works if this line is commented out

Most aesthetics mappings won't be valid for the labels. We could try to be a bit more clever and ignore the mapping for color if we know it will fail if you think it is important. Hadn't considered that case, so I'm not even sure how difficult that would be right now. What is happening here is "theta" has the school dimension, so it could have any of the colors in that list, which ends up in invalid syntax when trying to plot a single text with the multiple plots.

This is weird, the chains are still there...

Yes, because you are saying you want each chain to have different colors, so they must have different lines for that to happen! We might want to add a check when combined is True to make sure no aesthetic is mapped to the "chain" dimension and either error out or emit a warning.

aloctavodia commented 5 months ago

I think there is a common theme here, the flexibility of the framework and how much do we want to prevent users from its own "creativity" (just to keep the discussion polite, haha).

You have created a set of rules that are very expressive and can lead to very customizable plots, some funny ones, and some non-sensical statements. I am not sure we want to restrict that flexibility so we should do nothing for the first and second points (Still I may need more time to think about #33). For the third point, we may want to do something because "combined" is not part of the "framework" but an argument of a plot created on top of the framework. I am just thinking out loud (not sure I fully agree with myself), and trying to come up with a general criterion for making this kind of decision. We should discuss this during the next meeting.

OriolAbril commented 5 months ago

I agree with the general idea. With combined I'd probably go with the error and add "use combined=False" in the error message, the fix is really easy and I don't think actually generating the plot benefits anyone, the warning might be missed or supressed even...

OriolAbril commented 5 months ago

Found the way to know beforehand which aesthetics will work, and tested it on the rugby_field model with color mapped to __variable__, field and team:

pc = plot_forest(
     rugby_field,
    var_names=["intercept", "atts_team", "defs", "sd_att"],
     combined=True,
     pc_kwargs={
         "aes": {"color": [...]},
         "plot_grid_kws": {"width_ratios": [1, 2], "layout": "none"}
     },
     aes_map={"labels": ["color"]},
)

imatge

The 2nd plot particularly I think highlights the issue #33. I think I'll merge this and then open multiple smaller PRs to work on #33, try to fix plot_trace_dist...

arviz-devs / arviz-plots

Add plot_forest and tests #34

Codecov Report