bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.08k stars 124 forks source link

Non-parent parameter not found in posterior when using fixed Data prior #850

Closed ivanistheone closed 1 week ago

ivanistheone commented 1 month ago

Hi all. I ran into an issue similar to https://github.com/bambinos/bambi/issues/750 where a variable required for posterior predictive of the response variable is not included in the inference data object.

I'm trying to fit a Gaussian model with known, fixed variance sigma=15, and custom prior norm(100,40) on the mean. This is for educational purposes, to show the simplest possible model. I found a way to add sigma as constant, by setting a bmb.Prior("Data", value=15), and the complete code example is like this:

# toy dataset
import pandas as pd
iqs = [ 82.6, 105.5,  96.7,  84.0, 127.2,  98.8,  94.3]
df = pd.DataFrame({"iq":iqs})

# Gaussian model with known variance sigma=15 and norm(100,40) prior on mean
import bambi as bmb
priors = {
    "Intercept": bmb.Prior("Normal", mu=100, sigma=40),
    "sigma": bmb.Prior("Data", value=15),
}
mod = bmb.Model("iq ~ 1",
                priors=priors,
                family="gaussian",
                link="identity",
                data=df)
mod
#      Formula: iq ~ 1
#       Family: gaussian
#         Link: mu = identity
# Observations: 7
#       Priors: 
#   target = mu
#       Common-level effects
#           Intercept ~ Normal(mu: 100.0, sigma: 40.0)
#       Auxiliary parameters
#           sigma ~ Data(value: 15.0)

idata = mod.fit()
# WORKS OK

Here sigma is not included in vars_to_sample, but the sigma info is preserved in idata under constant_data:

list(idata["constant_data"].keys())
# ['sigma']

If I then try to sample response variable I get this error:

mod.predict(idata, kind="response")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 mod.predict(idata, kind="response")

File [.../bambi/models.py:877], in Model.predict(self, idata, kind, data, inplace, include_group_specific, sample_new_groups)
    874 required_kwargs = {"model": self, "posterior": idata.posterior}
    875 optional_kwargs = {"data": data}
--> 877 posterior_predictive = self.family.posterior_predictive(
    878     **required_kwargs, **optional_kwargs
    879 )
    880 posterior_predictive = posterior_predictive.to_dataset(name=response_aliased_name)
    882 if "posterior_predictive" in idata:

File [...bambi/families/family.py#line=148), in Family.posterior_predictive(self, model, posterior, **kwargs)
    147 response_dist = get_response_dist(model.family)
    148 response_term = model.response_component.term
--> 149 kwargs, coords = self._make_dist_kwargs_and_coords(model, posterior, **kwargs)
    151 # Handle constrained responses
    152 if response_term.is_constrained:
    153     # Bounds are scalars, we can safely pick them from the first row

File [... bambi/families/family.py:256], in Family._make_dist_kwargs_and_coords(self, model, posterior, **kwargs)
    254         kwargs[param] = np.asarray(component.prior)
    255     else:
--> 256         raise ValueError(
    257             "Non-parent parameter not found in posterior."
    258             "This error shouldn't have happened!"
    259         )
    261 # Determine the array with largest number of dimensions
    262 ndims_max = max(x.ndim for x in kwargs.values())

ValueError: Non-parent parameter not found in posterior.This error shouldn't have happened!

Is there some way to make _make_dist_kwargs_and_coords look for sigma value in the constant_data?

Am-I doing something wrong/unexpected by setting the sigma prior using bmb.Prior("Data", value=15) ? I'd be happy to use another approach.

Oh and the context is pymc.__version__ == '5.17.0' and bmb.__version__ == '0.14.0' on macOS.

tomicapretto commented 3 weeks ago

@ivanistheone thanks for reporting the issue. There are two things going on here.

The first one, is that if you want to set a parameter to a constant value, you should simply use the constant value, not a Prior that calls pm.Data under the hood (although I have to say that was a good hack! I had not thought about it). Then, you should do

import pandas as pd
import bambi as bmb

iqs = [ 82.6, 105.5,  96.7,  84.0, 127.2,  98.8,  94.3]
df = pd.DataFrame({"iq":iqs})

priors = {
    "Intercept": bmb.Prior("Normal", mu=100, sigma=40),
    "sigma": 15,
}

mod = bmb.Model(
    "iq ~ 1",
    priors=priors,
    family="gaussian",
    link="identity",
    data=df
)

idata = mod.fit()
mod.predict(idata, kind="response")

However, this is still not working, but for a different reason. I'm fixing that right now. I'll update you when it's on main.

ivanistheone commented 1 week ago

I can confirm the above code (with sigma as float) works now using the Bambi version on main.

Thanks for looking into and fixing!