Closed williambeordo closed 9 months ago
If you do inspect the inferencedata object manually, is the prior predictive group there?
If you don't have any observed variables in the pymc model, you can still use sample_prior_predictive because it samples both prior and prior predictive, and the result will be valid and contain prior samples. It won't be usable for plot_ppc though
No, the prior predictive group is missing in the inferencedata object, only the prior is present.
Yes, I wanted to do some checks of the model before putting data, because I was trying to follow the notebook "Prior and Posterior Predictive Checks" in the PyMC website. So, are you saying that using the prior group is the same as using the prior predictive?
Thank you for the answer.
There are a lot of moving pieces and terminology which I think is a source of confusion. I'll try to clarify some of it.
We distinguish between the parameter and the observed space. The quantities stored in the prior (and then posterior) groups correspond to the parameter space, whereas the prior predictive and observed data (and posterior predictive or even predictions later) correspond to the observed space. You will see in the notebook you mentioned that even the models used for prior predictive checks have variables that use the observed
argument. But that doesn't mean the values provided are used, sample_prior_predictive does not use those values at all; it only registers if the argument is provided to be able to assign each variable to one of these two spaces/groups.
The two first figures in the notebook are doing checks on the observed space using regression lines, but without comparing with the observations, so they are done manually both transformation wise (e.g. y = prior["a"] + prior["b"] * x
) and plotting. That is because plot_ppc only takes care of a specific type of prior/posterior predictive checks: comparing the predictive distributions to the distribution of the observed data. So while you can do prior checks (i.e. imagine you knew the slope must be positive for that model, you might want to check the prior of b
to ensure this happens) or prior predictive checks (like the ones done in the notebook) without technically using the observed data nor the quantities stored in the prior predictive group, you can do them with plot_ppc
Thank you for the explanation, but I’m not sure where the real problem lies.
My model has observed variables, so I would like to see the prior prediction along with the observations.
Some examples like this one use explicitly plot_ppc
.
The question is: Why do I get error when I try to produce a similar plot? i.e. why pymc.sample_prior_predictive
doesn't store the prior_predictive
group?
Sorry for my limited knowledge on this topic. Thank you again.
Can you share your model and the output of printing the inferencedata result? If you have observed variables they should be stored in the prior_predictive group
Sure, here is my model:
with pm.Model() as self.model:
# DEFINING PRIORS FOR THE MODEL PARAMETERS
def set_prior(name, prior):
if prior[0] == 'Normal':
return pm.Normal(name=name, mu=prior[1], sigma=prior[2])
elif prior[0] == 'Uniform':
return pm.Uniform(name=name, lower=prior[1], upper=prior[2])
elif prior[0] == 'LeftTruncatedNormal':
return pm.TruncatedNormal(name=name, mu=prior[1], sigma=prior[2], lower=prior[3])
mb = set_prior(self.param_names[0], priors[0])
bb = set_prior(self.param_names[1], priors[1])
mdt = set_prior(self.param_names[2], priors[2])
adt = set_prior(self.param_names[3], priors[3])
bdt = set_prior(self.param_names[4], priors[4])
mdT = set_prior(self.param_names[5], priors[5])
adT = set_prior(self.param_names[6], priors[6])
bdT = set_prior(self.param_names[7], priors[7])
g0 = set_prior(self.param_names[8], priors[8])
obs_r = velocity_obs[0]
obs_r_err = velocity_obs[1]
obs_v = velocity_obs[2]
obs_v_err = velocity_obs[3]
# EXPECTED VALUES FOR VELOCITY AND DENSITY
model = MWmond(bulge_par=[mb, bb], thin_disk_par=[mdt, adt, bdt], thick_disk_par=[mdT, adT, bdT], g0=g0)
mu_vrot = model.vrot(r=obs_r)
if density_obs is not None:
mu_rho = model.rho_bar(r=density_obs[0], z=0.)
# LIKELIHOOD
pm.Normal(name='Velocity', mu=mu_vrot, sigma=obs_v_err, observed=obs_v)
if density_obs is not None:
pm.Normal(name='Density', mu=mu_rho, sigma=density_obs[2], observed=density_obs[1])
# STARTING THE CALCULATIONS
idata = pm.sample_prior_predictive(samples=n_samp, model=self.model, var_names=self.param_names)
import jax
import tensorflow_probability.substrates.jax as tfp
jax.scipy.special.erfcx = tfp.math.erfcx
idata.extend(pm.sampling_jax.sample_numpyro_nuts(draws=n_samp, **kwargs))
idata.extend(pm.sample_posterior_predictive(idata, model=self.model))
print(idata)
az.plot_ppc(idata, group="prior", num_pp_samples=100)
az.plot_ppc(idata, group="posterior", num_pp_samples=100)
While this is the output of print(idata)
:
Inference data with groups:
> posterior
> posterior_predictive
> sample_stats
> prior
> observed_data
/opt/anaconda3/envs/MilkyWay-Gaia/lib/python3.10/site-packages/arviz/stats/density_utils.py:487: UserWarning: Your data appears to have a single value or no finite values
warnings.warn("Your data appears to have a single value or no finite values")
(density_obs
has only one datum)
Thank you very much for the help.
idata = pm.sample_prior_predictive(samples=n_samp, model=self.model, var_names=self.param_names)
You are manually setting var_names
and it looks like "Velocity" and "Density" are not in self.param_names
, so they are not being sampled. If you use var_names
only the variables you provide + their dependencies will have their prior sampled. You should remove that argument if you want everything sampled. You'll see that before sample_prior_predictive, sample or sample_posterior_predictive start, the variables being sampled will be printed.
Side note, if you are inside the model context you can skip the model
argument too.
Oh I see, thank you very much for your time!
The following commands:
produce the error:
TypeError: "data" argument must have the group "prior_predictive" for ppcplot
PyMC version: 5.8.2 ArViz version: 0.16.1