arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.59k stars 394 forks source link

Add group keyword to plot_ppc #1002

Closed OriolAbril closed 4 years ago

OriolAbril commented 4 years ago

In addition to posterior predictive checks, prior predictive checks are also a way to check if the desired regions of the parameter space are explored. Adding a group keyword (taking as values either posterior or prior) would allow to use data from the posterior predictive or prior predictive without much changes to the plotting code.

nitishp25 commented 4 years ago

Hi @OriolAbril, do you mean to say that we add a group argument in plot_ppc which can accept posterior or prior and then inside plot_ppc we check for the value of group in an if block and just assign the values to data accordingly? Or is there something else to be done?

OriolAbril commented 4 years ago

Exactly, the xarray dataset which is currenty named posterior_predictive and hardcoded to be the posterior_predictive group (in line posterior_predictive = data.posterior_predictive) should be renamed to be predictive_dataset and contain the data from either prior predictive or posterior predictive depending on the group argument.

nitishp25 commented 4 years ago

Alright, I'll start working on it.

corriebar commented 4 years ago

I just tried to use the ppc plot for priors and running the following code, I get an error:

[model]
pm_data = az.from_pymc3(prior = priors)

az.plot_ppc(pm_data, group="prior")

error:


TypeError                                 Traceback (most recent call last)
<ipython-input-32-d79576387f16> in <module>
      2 pm_data = az.from_pymc3(prior = priors)
      3 
----> 4 az.plot_ppc(pm_data, group="prior")

~/.local/share/virtualenvs/PyLadies-Bayesian-Tutorial-HLPPdyhP/src/arviz/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, figsize, textsize, data_pairs, var_names, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, ax, backend, backend_kwargs, group, show)
    170         if not hasattr(data, groups):
    171             raise TypeError(
--> 172                 '`data` argument must have the group "{group}" for ppcplot'.format(group=groups)
    173             )
    174 

TypeError: `data` argument must have the group "prior_predictive" for ppcplot```

I believe the error is due to the fact that while the posterior predictive is data.posterior_predictive, for the prior this is simply data.prior and not data.prior_predictive
OriolAbril commented 4 years ago

There are actually two issues hidden here, one related to observed data and the other to prior predictive.

plot_ppc compares {prior/posterior}_predictive with observed_data group. Using pm_data = az.from_pymc3(prior = priors) you will get an InferenceData without observed data. This one should be easy to solve by using pm_data = az.from_pymc3(prior = priors, model=model) (or calling from_pymc3 from within a model context).

Moreover, in PyMC3 it is hard to make the distinction between prior and prior predictive (they are both sampled by pm.sample_prior_predictive) so this also adds to the confusion. ArviZ divides the variables between the two groups in some cases. It should be possible to do with the model, but it is currently implemented only when the trace is present. This must be fixed in io_pymc3 because otherwise it prevents prior predictive plots to be done before sampling which would be their main use.

@corriebar can you check that observed data can be retrieved only with the model data? I will then open an issue about prior/prior_predictive distinction.

corriebar commented 4 years ago

I tried by adding the model, but the error stays the same. It only works when adding the trace, then it also doesn't matter if the model was provided or not.

Also, the inference object only gets the prior_predictive object if I add a trace, having the model or not doesn't seem to make much of a difference. Providing only the prior (with or without model) only adds the prior object to InferenceData.

jankaWIS commented 1 month ago

Hi, I just ran into the same issue getting this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[37], line 2
      1 prior_samples = model.prior_predictive(draws=1000)
----> 2 az.plot_ppc(prior_samples, kind="kde")
      3 plt.show()

File ~/.conda/envs/pymc_env_5/lib/python3.12/site-packages/arviz/plots/ppcplot.py:227, in plot_ppc(data, kind, alpha, mean, observed, observed_rug, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show)
    225 for groups in (f"{group}_predictive", "observed_data"):
    226     if not hasattr(data, groups):
--> 227         raise TypeError(f'`data` argument must have the group "{groups}" for ppcplot')
    229 if kind.lower() not in ("kde", "cumulative", "scatter"):
    230     raise TypeError("`kind` argument must be either `kde`, `cumulative`, or `scatter`")

TypeError: `data` argument must have the group "posterior_predictive" for ppcplot

I believe it's the same issue as in this thread and what is discussed in this discussion: https://discourse.pymc.io/t/a-problem-interpreting-an-error-in-pymc/14550/6

Is there any fix to this issue yet?

Thanks!

OriolAbril commented 1 month ago

change

az.plot_ppc(prior_samples, kind="kde")

for az.plot_ppc(prior_samples, kind="kde", group="prior")

It would be nice to fall back to prior_predictive if posteior_predictive is missing, but it is not implemented, you need to be explicit if you want prior predictive checks