arviz-devs / arviz-plots

ArviZ modular plotting
https://arviz-plots.readthedocs.io
Apache License 2.0
3 stars 2 forks source link

Adding New Rootogram Plot #81

Open imperorrp opened 2 months ago

imperorrp commented 2 months ago

Pushed draft of the new Rootogram implementation code (#52 ) .

Current plot output when the rugby default Arviz datatree is passed (should work with any datatree with posterior predictive and observed groups like plot_ppc though):

azp.plot_rootogram(data) image

azp.plot_rootogram(data, plot_kwargs={"predictive": False}) image

azp.plot_rootogram(data, observed=False) image

pc = azp.plot_rootogram(data, backend="bokeh")
pc.show()

image

azp.plot_rootogram(data, observed_rug=True) image

^If we want a rugplot for rootograms as well like plot_ppc.


📚 Documentation preview 📚: https://arviz-plots--81.org.readthedocs.build/en/81/

imperorrp commented 2 months ago

Todo: Shift rug lower, below the bottom of the bars

imperorrp commented 2 months ago

Rugplots now render below the bottom of the rootogram bars.

To allow for some tolerance gap for visibility, I've currently set this algorithm:

min_histogram_bottom = min(histogram_bottom)
min_bottom[var_name] = min_histogram_bottom - (0.2 * (0 - min_histogram_bottom))

(It may be that picking a specific number to subtract from min_histogram is better as that'd keep this gap a constant number of units, but then the chart size may vary and so would the perceivable gap)

Output looks like this now when the rugplot is plotted:

image

imperorrp commented 2 months ago

Binning works as expected now using the 'get_bins' arviz-stats branch:

image

imperorrp commented 2 months ago

Tests are failing due to plot_dist modification requirement since plot_rootogram currently depends on the unmerged arviz-stats 'get_bins' branch. This modification will be required globally once this Arviz-Stats branch is merged too

imperorrp commented 1 month ago

Added tests for plot_rootogram

codecov-commenter commented 1 month ago

Codecov Report

Attention: Patch coverage is 93.60000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 85.39%. Comparing base (f4a39af) to head (a67bb2d).

Files with missing lines Patch % Lines
src/arviz_plots/plots/rootogramplot.py 92.72% 8 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #81 +/- ## ========================================== + Coverage 84.80% 85.39% +0.59% ========================================== Files 21 22 +1 Lines 2336 2451 +115 ========================================== + Hits 1981 2093 +112 - Misses 355 358 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

aloctavodia commented 1 month ago

An example of a "hanging" rootogram and a "suspended" rootogram, including the uncertainty for the predictions https://github.com/arviz-devs/Exploratory-Analysis-of-Bayesian-Models/pull/60. We do not necessarily need to follow those examples, they are just suggestions. Actually, once we have a working rootograms that example will use whatever arviz implements.

imperorrp commented 1 month ago

An example of a "hanging" rootogram and a "suspended" rootogram, including the uncertainty for the predictions arviz-devs/Exploratory-Analysis-of-Bayesian-Models#60. We do not necessarily need to follow those examples, they are just suggestions. Actually, once we have a working rootograms that example will use whatever arviz implements.

If we follow those examples, the line_y visual element function from #85 can be used for the thin lines to represent the hanging and suspended lines. For the hanging case, we can use the existing logic and for suspended we could modify the logic in this portion of the code:

histogram_bottom = new_obs_hist.sel(plot_axis="y") - pp_hist.sel(plot_axis="histogram")
histogram_bottom = histogram_bottom.expand_dims(plot_axis=["histogram_bottom"])
# print(f" diff = {a}\n")

new_pp_hist = xr.concat(
    (
        new_obs_hist.sel(plot_axis="y"),  # getting tops of histogram (observed values)
        pp_hist.sel(plot_axis="left_edges"),
        pp_hist.sel(plot_axis="right_edges"),
        histogram_bottom,
    ),
    dim="plot_axis",
).assign_coords(plot_axis=["histogram", "left_edges", "right_edges", "histogram_bottom"])

And instead pass a histogram_top along the histogram dimension (root of predictive count minus observed count) and 0 for the histogram_bottom dimension:

histogram_top = pp_hist.sel(plot_axis="histogram") - new_obs_hist.sel(plot_axis="y")
histogram_top = histogram_top.expand_dims(plot_axis=["histogram"])

For the uncertainty representation of 94% HDI, what would the process be?