Migrate `plot_gp_dist` as a generic band-plotting function

michaelosthege commented 2 years ago

Tell us about it

pymc.gp.util.plot_gp_dist is a little-known, but very generic plotting function that is useful for GPs, time series, regressions and many more.

Not only does it plot a smooth band based on the percentiles, but by default it plots a few posterior draws as inidividual lines. Here's an example with a linear model:

x = numpy.arange(10)
intercept = numpy.random.uniform(10, size=1000)
slope = numpy.random.normal(5, size=1000)
y = intercept[:, None] + slope[:, None] * x[None, :]

fig, ax = pyplot.subplots(dpi=200)
pm.gp.util.plot_gp_dist(
    ax=ax,
    samples=y,
    x=x,
    plot_samples=True,
)

Thoughts on implementation

Basically copying from PyMC (so we can kick it out from the PyMC codebase).

While we're at it, here's a small wishlist of things to improve:

More quantitative transparency/shades (see https://github.com/pymc-devs/pymc/issues/4591).
An option to draw dashed lines (or switching to basically an overlay of arviz.plot_hdi) for multiple credible interval levels (ETI or HDI). (A.k.a. drawing lines at custom percentiles.)
Drawing shaded bands but without transparency. This would be useful for academics needing to convert to EPS which doesn't support transparency.

OriolAbril commented 2 years ago

soma2000-lang commented 2 years ago

Working on this

tomicapretto commented 1 year ago

My opinion is that this kind of visualizations should not be part of ArviZ. Not because I think it's impossible for all cases, but because I think ArviZ doesn't know everything about the model structure in order to be able to generate this kind of plot generically. I'm open to other opinions of course.

michaelosthege commented 1 year ago

I'm not sure which model structures you'd like to adapt it to?

In my experience plot_gp_dist is very widely applicable. All you need are posterior draws (n_samples, length) and a vector for the x-axis (length,). I'm using it all the time and never had to adapt it to particular model structures. After all it's just a helper function to plot one variable, and I wouldn't expect it to swallow idata.

The only change I ever did to plot_gp_dist was adding dashed lines for HDIs, like in the figures in this notebook.

A different approach by the way would be a histogram instead of a band based on percentiles. It could more accurately reflect multimodal time series.

tomicapretto commented 1 year ago

I think the function is good as it is for the specific case that you have a single group and as a user you're willing to pass the values of the predictor (in most cases it would require users to construct a grid). But, what if, for example, you had multiple groups? Would this function work automatically, or, would it require users to loop, slicing things appropriately before calling the function?

I don't have anything against this kind of functions existing. But given the very particular context they apply to, I would leave them in PyMC.

michaelosthege commented 1 year ago

But, what if, for example, you had multiple groups? Would this function work automatically, or, would it require users to loop, slicing things appropriately before calling the function?

I would see it as the user's responsibility

The motivation behind opening this issue was 98 % because it's the only plotting function in PyMC (apart from model_to_graphviz) and IMO should have been migrated to ArviZ years ago.

arviz-devs / arviz

Migrate `plot_gp_dist` as a generic band-plotting function #1979

Tell us about it

Thoughts on implementation