arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.58k stars 394 forks source link

Migrate `plot_gp_dist` as a generic band-plotting function #1979

Open michaelosthege opened 2 years ago

michaelosthege commented 2 years ago

Tell us about it

pymc.gp.util.plot_gp_dist is a little-known, but very generic plotting function that is useful for GPs, time series, regressions and many more.

Not only does it plot a smooth band based on the percentiles, but by default it plots a few posterior draws as inidividual lines. Here's an example with a linear model:

x = numpy.arange(10)
intercept = numpy.random.uniform(10, size=1000)
slope = numpy.random.normal(5, size=1000)
y = intercept[:, None] + slope[:, None] * x[None, :]

fig, ax = pyplot.subplots(dpi=200)
pm.gp.util.plot_gp_dist(
    ax=ax,
    samples=y,
    x=x,
    plot_samples=True,
)

Thoughts on implementation

Basically copying from PyMC (so we can kick it out from the PyMC codebase).

While we're at it, here's a small wishlist of things to improve:

OriolAbril commented 2 years ago

related to https://arviz-devs.github.io/arviz/api/generated/arviz.plot_lm.html and https://arviz-devs.github.io/arviz/api/generated/arviz.plot_ts.html#arviz.plot_ts

soma2000-lang commented 2 years ago

Working on this

tomicapretto commented 1 year ago

My opinion is that this kind of visualizations should not be part of ArviZ. Not because I think it's impossible for all cases, but because I think ArviZ doesn't know everything about the model structure in order to be able to generate this kind of plot generically. I'm open to other opinions of course.

michaelosthege commented 1 year ago

I'm not sure which model structures you'd like to adapt it to?

In my experience plot_gp_dist is very widely applicable. All you need are posterior draws (n_samples, length) and a vector for the x-axis (length,). I'm using it all the time and never had to adapt it to particular model structures. After all it's just a helper function to plot one variable, and I wouldn't expect it to swallow idata.

The only change I ever did to plot_gp_dist was adding dashed lines for HDIs, like in the figures in this notebook.

A different approach by the way would be a histogram instead of a band based on percentiles. It could more accurately reflect multimodal time series.

tomicapretto commented 1 year ago

I think the function is good as it is for the specific case that you have a single group and as a user you're willing to pass the values of the predictor (in most cases it would require users to construct a grid). But, what if, for example, you had multiple groups? Would this function work automatically, or, would it require users to loop, slicing things appropriately before calling the function?

I don't have anything against this kind of functions existing. But given the very particular context they apply to, I would leave them in PyMC.

michaelosthege commented 1 year ago

But, what if, for example, you had multiple groups? Would this function work automatically, or, would it require users to loop, slicing things appropriately before calling the function?

I would see it as the user's responsibility

The motivation behind opening this issue was 98 % because it's the only plotting function in PyMC (apart from model_to_graphviz) and IMO should have been migrated to ArviZ years ago.