arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.6k stars 402 forks source link

Plots fail when variables added to model post-fitting #690

Closed fonnesbeck closed 4 years ago

fonnesbeck commented 5 years ago

When variables are added to a model post-fitting, plots fail even when the requested plot does not include the additional variable. This is relevant for plotting GP parameters, as the predictive mean is typically added to the model just prior to posterior predictive sampling, since it is not needed until then.

For example,

with model:

    f_pred = gp.conditional('f_pred', Z)

    samples = pm.sample_posterior_predictive(tr, vars=[f_pred], samples=100)

But when I try to plot another variable in the model, it looks for f_pred nonetheless, and chokes.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-96-dba7c41ed83a> in <module>
----> 1 az.plot_forest(tr, var_names=['ls'])

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/plots/forestplot.py in plot_forest(data, kind, model_names, var_names, combined, credible_interval, rope, quartiles, ess, r_hat, colors, textsize, linewidth, markersize, ridgeplot_alpha, ridgeplot_overlap, figsize)
    101         data = [data]
    102 
--> 103     datasets = [convert_to_dataset(datum) for datum in reversed(data)]
    104 
    105     var_names = _var_names(var_names, datasets)

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/plots/forestplot.py in <listcomp>(.0)
    101         data = [data]
    102 
--> 103     datasets = [convert_to_dataset(datum) for datum in reversed(data)]
    104 
    105     var_names = _var_names(var_names, datasets)

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/converters.py in convert_to_dataset(obj, group, coords, dims)
    123     xarray.Dataset
    124     """
--> 125     inference_data = convert_to_inference_data(obj, group=group, coords=coords, dims=dims)
    126     dataset = getattr(inference_data, group, None)
    127     if dataset is None:

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/converters.py in convert_to_inference_data(obj, group, coords, dims, **kwargs)
     57         return from_pystan(posterior=obj, coords=coords, dims=dims, **kwargs)
     58     elif obj.__class__.__name__ == "MultiTrace":  # ugly, but doesn't make PyMC3 a requirement
---> 59         return from_pymc3(trace=obj, coords=coords, dims=dims, **kwargs)
     60     elif obj.__class__.__name__ == "EnsembleSampler":  # ugly, but doesn't make emcee a requirement
     61         return from_emcee(obj, coords=coords, dims=dims, **kwargs)

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/io_pymc3.py in from_pymc3(trace, prior, posterior_predictive, coords, dims)
    150         posterior_predictive=posterior_predictive,
    151         coords=coords,
--> 152         dims=dims,
    153     ).to_inference_data()

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/io_pymc3.py in to_inference_data(self)
    135             **{
    136                 "posterior": self.posterior_to_xarray(),
--> 137                 "sample_stats": self.sample_stats_to_xarray(),
    138                 "posterior_predictive": self.posterior_predictive_to_xarray(),
    139                 "prior": self.prior_to_xarray(),

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     23                 if getattr(cls, prop) is None:
     24                     return None
---> 25             return func(cls, *args, **kwargs)
     26 
     27         return wrapped

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/io_pymc3.py in sample_stats_to_xarray(self)
     76             name = rename_key.get(stat, stat)
     77             data[name] = np.array(self.trace.get_sampler_stats(stat, combine=False))
---> 78         log_likelihood, dims = self._extract_log_likelihood()
     79         if log_likelihood is not None:
     80             data["log_likelihood"] = log_likelihood

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/base.py in wrapped(cls, *args, **kwargs)
     23                 if getattr(cls, prop) is None:
     24                     return None
---> 25             return func(cls, *args, **kwargs)
     26 
     27         return wrapped

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/io_pymc3.py in _extract_log_likelihood(self)
     54         for chain in self.trace.chains:
     55             log_like = (log_likelihood_vals_point(point) for point in self.trace.points([chain]))
---> 56             chain_likelihoods.append(np.stack(log_like))
     57         return np.stack(chain_likelihoods), coord_name
     58 

~/anaconda3/envs/dev/lib/python3.7/site-packages/numpy/core/shape_base.py in stack(arrays, axis, out)
    408     """
    409     _warn_for_nonsequence(arrays)
--> 410     arrays = [asanyarray(arr) for arr in arrays]
    411     if not arrays:
    412         raise ValueError('need at least one array to stack')

~/anaconda3/envs/dev/lib/python3.7/site-packages/numpy/core/shape_base.py in <listcomp>(.0)
    408     """
    409     _warn_for_nonsequence(arrays)
--> 410     arrays = [asanyarray(arr) for arr in arrays]
    411     if not arrays:
    412         raise ValueError('need at least one array to stack')

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/io_pymc3.py in <genexpr>(.0)
     53         chain_likelihoods = []
     54         for chain in self.trace.chains:
---> 55             log_like = (log_likelihood_vals_point(point) for point in self.trace.points([chain]))
     56             chain_likelihoods.append(np.stack(log_like))
     57         return np.stack(chain_likelihoods), coord_name

~/anaconda3/envs/dev/lib/python3.7/site-packages/arviz/data/io_pymc3.py in log_likelihood_vals_point(point)
     45             log_like_vals = []
     46             for var, log_like in cached:
---> 47                 log_like_val = log_like(point)
     48                 if var.missing_values:
     49                     log_like_val = log_like_val[~var.observations.mask]

~/anaconda3/envs/dev/lib/python3.7/site-packages/pymc3/model.py in __call__(self, *args, **kwargs)
   1183     def __call__(self, *args, **kwargs):
   1184         point = Point(model=self.model, *args, **kwargs)
-> 1185         return self.f(**point)
   1186 
   1187 compilef = fastfn

~/anaconda3/envs/dev/lib/python3.7/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    884                     raise TypeError("Missing required input: %s" %
    885                                     getattr(self.inv_finder[c], 'variable',
--> 886                                             self.inv_finder[c]))
    887                 if c.provided > 1:
    888                     restore_defaults()

TypeError: Missing required input: f_pred

This occurs for both plot_trace and plot_forest. Not sure when this broke, but it worked in the pre-arviz days.

Currently running 0.3.3 on Python 3.7.3.

ahartikainen commented 5 years ago

This is a bug in from_pymc.

@ColCarroll @aloctavodia @canyon289 what should be the correct way to handle "missing" data?

OriolAbril commented 4 years ago

The issue looks like it is triggered when getting log likelihood data as it tries to evaluate the logp_elemwise after modifying the model which results in this error. A possible workaround would be what I described in this other issue.

The solution should probably involve making sure logp_elemwise can be evaluated before calling it and if it cannot be called write a warning about log likelihood values not being available.

OriolAbril commented 4 years ago

Could you confirm this is solved in ArviZ master?

Running code right away should print a warning. Converting explicitely with az.from_pymc3(trace, log_likelihood=False) should work fine and without warnings.