Adding New Rootogram Plot

imperorrp commented 2 months ago

Pushed draft of the new Rootogram implementation code (#52 ) .

The existing hist functions from the currently still-open plot_dist hist addition PR #47 were used here as well as some code from the plot_ppc PR #55.
The hist visual element backend interface docstring was modified with 'y' now referring to the 'top' y-coordinate explicitly and not height of the bars (as suggested by @OriolAbril). The matplotlib implementation of the hist visual element (which uses bar behind the scenes) was also updated internally with y=y-bottom to reflect this. (This has to be updated in #47 now as well). The bars were not being plotted in the expected positions otherwise.
WIP work includes appropriate binning for each facetted subset of the data to be plotted.
The observed data is used for binning for each subset and the bin heights here are required for setting the 'top's of the predictive bars so this is computed always, regardless of whether observed is passed as True or False (The True/False condition only determines whether it is plotted or not)

Current plot output when the rugby default Arviz datatree is passed (should work with any datatree with posterior predictive and observed groups like plot_ppc though):

azp.plot_rootogram(data)

azp.plot_rootogram(data, plot_kwargs={"predictive": False})

azp.plot_rootogram(data, observed=False)

pc = azp.plot_rootogram(data, backend="bokeh")
pc.show()

azp.plot_rootogram(data, observed_rug=True)

^If we want a rugplot for rootograms as well like plot_ppc.

📚 Documentation preview 📚: https://arviz-plots--81.org.readthedocs.build/en/81/

imperorrp commented 2 months ago

Todo: Shift rug lower, below the bottom of the bars

imperorrp commented 2 months ago

Rugplots now render below the bottom of the rootogram bars.

To allow for some tolerance gap for visibility, I've currently set this algorithm:

min_histogram_bottom = min(histogram_bottom)
min_bottom[var_name] = min_histogram_bottom - (0.2 * (0 - min_histogram_bottom))

(It may be that picking a specific number to subtract from min_histogram is better as that'd keep this gap a constant number of units, but then the chart size may vary and so would the perceivable gap)

Output looks like this now when the rugplot is plotted:

imperorrp commented 2 months ago

Binning works as expected now using the 'get_bins' arviz-stats branch:

imperorrp commented 2 months ago

Tests are failing due to plot_dist modification requirement since plot_rootogram currently depends on the unmerged arviz-stats 'get_bins' branch. This modification will be required globally once this Arviz-Stats branch is merged too

imperorrp commented 1 month ago

Added tests for plot_rootogram

codecov-commenter commented 1 month ago

Codecov Report

Attention: Patch coverage is 93.60000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 85.39%. Comparing base (f4a39af) to head (a67bb2d).

Files with missing lines	Patch %	Lines
src/arviz_plots/plots/rootogramplot.py	92.72%	8 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #81 +/- ## ========================================== + Coverage 84.80% 85.39% +0.59% ========================================== Files 21 22 +1 Lines 2336 2451 +115 ========================================== + Hits 1981 2093 +112 - Misses 355 358 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aloctavodia commented 1 month ago

An example of a "hanging" rootogram and a "suspended" rootogram, including the uncertainty for the predictions https://github.com/arviz-devs/Exploratory-Analysis-of-Bayesian-Models/pull/60. We do not necessarily need to follow those examples, they are just suggestions. Actually, once we have a working rootograms that example will use whatever arviz implements.

imperorrp commented 1 month ago

An example of a "hanging" rootogram and a "suspended" rootogram, including the uncertainty for the predictions arviz-devs/Exploratory-Analysis-of-Bayesian-Models#60. We do not necessarily need to follow those examples, they are just suggestions. Actually, once we have a working rootograms that example will use whatever arviz implements.

If we follow those examples, the line_y visual element function from #85 can be used for the thin lines to represent the hanging and suspended lines. For the hanging case, we can use the existing logic and for suspended we could modify the logic in this portion of the code:

histogram_bottom = new_obs_hist.sel(plot_axis="y") - pp_hist.sel(plot_axis="histogram")
histogram_bottom = histogram_bottom.expand_dims(plot_axis=["histogram_bottom"])
# print(f" diff = {a}\n")

new_pp_hist = xr.concat(
    (
        new_obs_hist.sel(plot_axis="y"),  # getting tops of histogram (observed values)
        pp_hist.sel(plot_axis="left_edges"),
        pp_hist.sel(plot_axis="right_edges"),
        histogram_bottom,
    ),
    dim="plot_axis",
).assign_coords(plot_axis=["histogram", "left_edges", "right_edges", "histogram_bottom"])

And instead pass a histogram_top along the histogram dimension (root of predictive count minus observed count) and 0 for the histogram_bottom dimension:

histogram_top = pp_hist.sel(plot_axis="histogram") - new_obs_hist.sel(plot_axis="y")
histogram_top = histogram_top.expand_dims(plot_axis=["histogram"])

For the uncertainty representation of 94% HDI, what would the process be?

arviz-devs / arviz-plots

Adding New Rootogram Plot #81

Codecov Report