Closed OriolAbril closed 4 years ago
Hi @OriolAbril, do you mean to say that we add a group
argument in plot_ppc
which can accept posterior
or prior
and then inside plot_ppc
we check for the value of group in an if block and just assign the values to data accordingly? Or is there something else to be done?
Exactly, the xarray dataset which is currenty named posterior_predictive
and hardcoded to be the posterior_predictive
group (in line posterior_predictive = data.posterior_predictive
) should be renamed to be predictive_dataset
and contain the data from either prior predictive or posterior predictive depending on the group argument.
Alright, I'll start working on it.
I just tried to use the ppc plot for priors and running the following code, I get an error:
[model]
pm_data = az.from_pymc3(prior = priors)
az.plot_ppc(pm_data, group="prior")
error:
TypeError Traceback (most recent call last)
<ipython-input-32-d79576387f16> in <module>
2 pm_data = az.from_pymc3(prior = priors)
3
----> 4 az.plot_ppc(pm_data, group="prior")
~/.local/share/virtualenvs/PyLadies-Bayesian-Tutorial-HLPPdyhP/src/arviz/arviz/plots/ppcplot.py in plot_ppc(data, kind, alpha, mean, figsize, textsize, data_pairs, var_names, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, ax, backend, backend_kwargs, group, show)
170 if not hasattr(data, groups):
171 raise TypeError(
--> 172 '`data` argument must have the group "{group}" for ppcplot'.format(group=groups)
173 )
174
TypeError: `data` argument must have the group "prior_predictive" for ppcplot```
I believe the error is due to the fact that while the posterior predictive is data.posterior_predictive, for the prior this is simply data.prior and not data.prior_predictive
There are actually two issues hidden here, one related to observed data and the other to prior predictive.
plot_ppc
compares {prior/posterior}_predictive
with observed_data
group. Using pm_data = az.from_pymc3(prior = priors)
you will get an InferenceData without observed data. This one should be easy to solve by using pm_data = az.from_pymc3(prior = priors, model=model)
(or calling from_pymc3
from within a model context).
Moreover, in PyMC3 it is hard to make the distinction between prior and prior predictive (they are both sampled by pm.sample_prior_predictive
) so this also adds to the confusion. ArviZ divides the variables between the two groups in some cases. It should be possible to do with the model, but it is currently implemented only when the trace is present. This must be fixed in io_pymc3
because otherwise it prevents prior predictive plots to be done before sampling which would be their main use.
@corriebar can you check that observed data can be retrieved only with the model data? I will then open an issue about prior/prior_predictive distinction.
I tried by adding the model, but the error stays the same. It only works when adding the trace, then it also doesn't matter if the model was provided or not.
Also, the inference object only gets the prior_predictive object if I add a trace, having the model or not doesn't seem to make much of a difference. Providing only the prior (with or without model) only adds the prior object to InferenceData.
Hi, I just ran into the same issue getting this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[37], line 2
1 prior_samples = model.prior_predictive(draws=1000)
----> 2 az.plot_ppc(prior_samples, kind="kde")
3 plt.show()
File ~/.conda/envs/pymc_env_5/lib/python3.12/site-packages/arviz/plots/ppcplot.py:227, in plot_ppc(data, kind, alpha, mean, observed, observed_rug, color, colors, grid, figsize, textsize, data_pairs, var_names, filter_vars, coords, flatten, flatten_pp, num_pp_samples, random_seed, jitter, animated, animation_kwargs, legend, labeller, ax, backend, backend_kwargs, group, show)
225 for groups in (f"{group}_predictive", "observed_data"):
226 if not hasattr(data, groups):
--> 227 raise TypeError(f'`data` argument must have the group "{groups}" for ppcplot')
229 if kind.lower() not in ("kde", "cumulative", "scatter"):
230 raise TypeError("`kind` argument must be either `kde`, `cumulative`, or `scatter`")
TypeError: `data` argument must have the group "posterior_predictive" for ppcplot
I believe it's the same issue as in this thread and what is discussed in this discussion: https://discourse.pymc.io/t/a-problem-interpreting-an-error-in-pymc/14550/6
Is there any fix to this issue yet?
Thanks!
change
az.plot_ppc(prior_samples, kind="kde")
for az.plot_ppc(prior_samples, kind="kde", group="prior")
It would be nice to fall back to prior_predictive if posteior_predictive is missing, but it is not implemented, you need to be explicit if you want prior predictive checks
In addition to posterior predictive checks, prior predictive checks are also a way to check if the desired regions of the parameter space are explored. Adding a group keyword (taking as values either
posterior
orprior
) would allow to use data from the posterior predictive or prior predictive without much changes to the plotting code.