Feature Request: Nested priors for common terms

bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.

https://bambinos.github.io/bambi/

MIT License

1.08k stars 124 forks source link

Feature Request: Nested priors for common terms #584

Open kpalin opened 2 years ago

kpalin commented 2 years ago

Hi,

I'm trying to play around with sparse models and would like to place NormalMixture prior with Dirichlet weights as the prior for the common terms. The code would be like

model = bmb.Model("response~1+d", data)
w_prior = bmb.Prior("Dirichlet", a=[1.0, 1.0])
subject_prior = bmb.Prior("NormalMixture", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5])
p = {"d": subject_prior}
model.set_priors(p)
model.build()

It crashes with aesara saying NotImplementedError: Cannot convert Dirichlet(a: [1.0 1.0]) to a tensor variable. I think it should be fairly simple fix to build the priors recursively also for CommonTerm:s like already done for GroupSpecificTerm:s. (Apologies for this not being a pull request.)

tomicapretto commented 2 years ago

@kpalin could you provide the data you're using so I can reproduce the error? I would like to know what are response and d.

tomicapretto commented 2 years ago

Do you want to reproduce the following PyMC model?

with pm.Model():
    w_prior = pm.Dirichlet("w", a=[1, 1])
    y = pm.NormalMixture("y", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5], observed=response)

or do you also want to include a predictor? What would be the predictor?

kpalin commented 2 years ago

Yes I'd like to have a predictor, e.g.

with pm.Model():
    w_prior = pm.Dirichlet("w", a=[1, 1])
    beta = pm.NormalMixture("beta", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5], shape=D)

    sigma = pm.HalfStudentT("sigma", nu=4, sigma=2.0322)
    pm.Normal("y", mu=pm.math.dot(X, beta), sigma=sigma, observed=y)
    tr = pm.sample()

The full code is in https://gist.github.com/kpalin/5069b6de8eb0f9fb7efb2d5861ade53c and the output from a run (in vscode interactive python) is in random_effect_bambi.pdf

zwelitunyiswa commented 2 years ago

Kimmo, this is some cool stuff. Thanks for sharing.

On Fri, Nov 4, 2022 at 11:16 AM Kimmo Palin @.***> wrote:

Yes I'd like to have a predictor, e.g.

with pm.Model(): w_prior = pm.Dirichlet("w", a=[1, 1]) beta = pm.NormalMixture("beta", w=w_prior, mu=[0, 0], sigma=[0.1, 4.5], shape=D)
sigma = pm.HalfStudentT("sigma", nu=4, sigma=2.0322)
pm.Normal("y", mu=pm.math.dot(X, beta), sigma=sigma, observed=y)
tr = pm.sample()
The full code is in https://gist.github.com/kpalin/5069b6de8eb0f9fb7efb2d5861ade53c and the output from a run (in vscode interactive python) is in random_effect_bambi.pdf https://github.com/bambinos/bambi/files/9939632/random_effect_bambi.pdf

— Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/584#issuecomment-1303739565, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3QQV6D5JJ6QWLDS6CJF7LWGUSF5ANCNFSM6AAAAAARW4LCRU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tomicapretto commented 2 years ago

@kpalin have a look at this branch in my fork https://github.com/tomicapretto/bambi/tree/experiment_prior_common

See the notebook play.ipynb.

It seems to work. But I can't promise this will enter Bambi (at least in this way) since I'm not very sure what's going on... To me, if the weights are shared among all the priors for all "d0", ... "d9" , it makes sense to see them as hierarchical. But I may be confused here.

Anyway, have a look at that and let me know if that works for you. Also, if you have references about this, I would love to have a read.

kpalin commented 2 years ago

That seems to work, also with formula like response ~ 0+d0+d1+d2+d3+d4+d5+d6+d7+d8+d9 (I've never before seen the c(d0,d1...) syntax.)

I don't have proper references for this apart from the blog post https://betanalpha.github.io/assets/case_studies/modeling_sparsity.html I already linked. I guess sparsity is not a thing with Bayes, when you can just sum over probability zero events.

tomicapretto commented 2 years ago

I'm leaving it open because I don't have time now to finish off this PR (and understand how it works too!). But it is definitely very interesting.

Edit

For the record. c(d0, d1, ..., d9) is not the same than d0 + d1 + ... + d9. The first creates a single model term which is ten-dimensional, and the second creates ten model terms where each one is one-dimensional.