bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.08k stars 124 forks source link

Non-Centered Reparameterization #785

Closed sambaths closed 7 months ago

sambaths commented 8 months ago

Sorry, if this is a noob question. I'm still learning,

Shouldn't we add the mean here?

Before - https://github.com/bambinos/bambi/blob/ff685b75a0ef17a11c8399de6de0a5f9d388d231/bambi/backend/terms.py#L145-L148

After -

 if self.noncentered and has_hyperprior(kwargs): 
     sigma = kwargs["sigma"] 
     mu = kwargs["mu"]
     offset = pm.Normal(label + "_offset", mu=0, sigma=1, dims=kwargs["dims"]) 
     return pm.Deterministic(label, offset * sigma + mu, dims=kwargs["dims"]) 

Also, this only happens when Prior is Normal (Assuming I read the codebase correctly), or is that not the case ? (I saw that this might be possible for other distributions also based on a discussion here)

Again, Sorry for the noob questions.

Any and all replies is highly appreciated.

tomicapretto commented 8 months ago

Hi @sambaths! No noobs questions, thanks for asking!

Shouldn't we add the mean here?

We don't have to do that here, as the group-specific effects in Bambi resemble random-effects in non-bayesian libraries. These effects are deflections around a common effect. So, a priori, they are centered around zero (we don't need the mean).

For example, let's take a model with a hierarchical intercept (or random effects) where we have $J$ groups.

This is the centered parametrization

$$ \begin{aligned} Y_i &\sim \text{Normal}(\mu_i, \sigma) \ \mui &= \alpha{j[i]} \ \alphaj &\sim \text{Normal}(\mu\alpha, \sigma\alpha) & \text{for all } j \in 1 \dots, J \ \mu\alpha &\sim \text{Normal} \ \sigma_\alpha &\sim \text{HalfNormal} \ \sigma &\sim \text{HalfNormal} \end{aligned} $$

coords = {"group": [1, 2, ... J]}
with pm.Model(coords=coords):
    alpha_sigma = pm.HalfNormal("alpha_sigma")
    alpha_mu = pm.Normal("alpha_mu")
    alpha = pm.Normal("alpha", mu=alpha_mu, sigma=alpha_sigma, dims="group")
    sigma = pm.HalfNormal("sigma")
    mu = alpha[group_idx]
    pm.Normal("y", mu=mu, sigma=sigma, observed=y)

And this is the non-centered one

$$ \begin{aligned} Y_i &\sim \text{Normal}(\mu_i, \sigma) \ \mui &= \alpha{j[i]} \ \alphaj &= \mu\alpha + \alphaj^* \sigma\alpha \ \alphaj^* &\sim \text{Normal}(0, 1) & \text{for all } j \in 1 \dots, J \ \mu\alpha &\sim \text{Normal} \ \sigma_\alpha &\sim \text{HalfNormal} \ \sigma &\sim \text{HalfNormal} \end{aligned} $$

coords = {"group": [1, 2, ... J]}
with pm.Model(coords=coords):
    alpha_sigma = pm.HalfNormal("alpha_sigma")
    alpha_mu = pm.Normal("alpha_mu")
    alpha_star = pm.Normal("alpha_star",  dims="group")
    alpha = alpha_mu + alpha_star * alpha_sigma
    sigma = pm.HalfNormal("sigma")
    mu = alpha[group_idx]
    pm.Normal("y", mu=mu, sigma=sigma, observed=y)

In Bambi, we use by default something that looks like the non-centered version. We have

$$ \alphaj = \alpha\text{baseline} + \alpha_{\text{deflection}, j} $$

The $\alpha\text{baseline}$ represents the $\mu\alpha$ above, and the $\alpha_{\text{deflection}, j}$ represents $\alphaj^* \sigma\alpha$. The latter is centered around zero, just like the group-specific effects in Bambi.


As for the second part of the question, about whether this only happens with Normal distributions, you're reading it correctly. We almost always use normal distributions for the deflections so that is what was implemented. If you use another distribution, the non-centered approach doesn't apply in Bambi (it could be modified, but i don't know how hard it would be)

sambaths commented 8 months ago

Thanks @tomicapretto for that explanation. That clears the first part for me.

As for the second part of the question, about whether this only happens with Normal distributions, you're reading it correctly. We almost always use normal distributions for the deflections so that is what was implemented. If you use another distribution, the non-centered approach doesn't apply in Bambi (it could be modified, but i don't know how hard it would be)

on last part, I think I might have incorrectly worded my question above -

Suppose, For one of the parameters in my model, we have a prior that is LogNormal, that would also get non-centered using offsets from the Normal, right ?

For a LogNormal, Shouldn't non-centered approach be looking something like this ? (based on the discussions here):

import theano.tensor as tt

# Centered
pm.LogNormal('val', mu=mu, sd=sd)
# NonCentered
val_raw = pm.Normal('val_raw', mu=0, sd=1)
val = tt.exp(val_raw * sd + mu)
tomicapretto commented 8 months ago

I think so. But notice the code says "non-centered Lognormal on log space", and I emphasize the "log space" part.

sambaths commented 7 months ago

I think so. But notice the code says "non-centered Lognormal on log space", and I emphasize the "log space" part.

Thanks @tomicapretto