bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.08k stars 124 forks source link

PYMC3 backend missing data imputation #362

Open zwelitunyiswa opened 3 years ago

zwelitunyiswa commented 3 years ago

I know that PYMC3 will do null data imputation automatically on a masked null value. With Bambi, we have to get rid of rows that have null data if those null cells are included in the model. Is it possible to use PYMC3's data imputation by applying a mask to null values or some other method?

Thanks for your kind consideration.

tomicapretto commented 3 years ago

Hi! Thanks for opening this issue!

Unfortunately, this is not available right now and I think it wouldn't be very straightforward to implement it.

Maybe, you can take the PyMC model within the Bambi model, add/remove/modify random variables, and then sample using the PyMC model instead of Bambi. But I'm not sure if this can be done. Maybe @aloctavodia knows if this is possible or not.

zwelitunyiswa commented 3 years ago

Interesting suggestion. I am not great at the PMYC syntax, which is why Bambi is so amazing because R-syntax is much easier to learn and understand. Does Bambi have a way to pull out the translation it makes to PYMC3 so that one could just use it to build on for running within PYMC3 for these pocket cases. Probably the answer is no, but thought I would ask anyways.

tomicapretto commented 3 years ago

For example

# setup
import bambi as bmb
import numpy as np
import pandas as pd
import pymc3 as pm

data = pd.DataFrame({
    "y" : np.random.normal(size=100),
    "x" : np.random.normal(size=100)
})

And if you do

model = bmb.Model("y ~ x", data)
model.build() # this is an intermediate step when you call model.fit()

you get a PyMC model in model.backend.model. This can be used as any other PyMC model to do things like

with model.backend.model:
    idata = pm.sample(return_inferencedata=True)

But yes, it requires one to be familiar with PyMC unfortunately. Let's wait for Osvaldo's input on this issue, he is much more familiar with PyMC3 than I am.

zwelitunyiswa commented 3 years ago

I can do that! Nice. I did not know that.

That means you can also leverage PYMC3 with Jax/Numpyro as sampler? If so, Wow. That’s killer.

Thank you!

On Tue, Jun 22, 2021 at 13:35 Tomás Capretto @.***> wrote:

For example

setupimport bambi as bmbimport numpy as npimport pandas as pdimport pymc3 as pm

data = pd.DataFrame({ "y" : np.random.normal(size=100), "x" : np.random.normal(size=100) })

And if you do

model = bmb.Model("y ~ x", data)model.build() # this is an intermediate step when you call model.fit()

you get a PyMC model in model.backend.model. This can be used as any other PyMC model to do things like

with model.backend.model: idata = pm.sample(return_inferencedata=True)

But yes, it requires one to be familiar with PyMC unfortunately. Let's wait for Osvaldo's input on this issue, he is much more familiar with PyMC3 than I am.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/362#issuecomment-866190406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3QQV4N7UO7THEAC362Z33TUDCWNANCNFSM47D7IC5Q .

aloctavodia commented 3 years ago

We may implement this without changing too much code inside Bambi if we record the missing observation prior to remove them, we then proceed as usual with all the bambi stuff and then after inference we automatically compute and return the posterior predictive distribution for the missing observations.

@zwelitunyiswa yes, that means you can to something like this. Notice also that in the upcoming versions of PyMC3 (V4 and later), PyMC3 will natively (and by default) will use samplers with similar speed ups, as those show in the example notebook. And eventually we will run Bambi on top of that.

zwelitunyiswa commented 3 years ago

@aloctavodia That would be great. I got the JAX sampling to work, then Jax/Numpyro made a change and I did not get around to downgrading to get stuff to work again.

However, it's good news that PYMC3 4 will natively take care of it. That's amazing. I was getting 8-10x speedups with Jax on my MacBook. You guys on Bambi/Pymc3 are doing some amazing work. For business guys like myself utilizing Bayes was painful but these tools just make it so much more convenient and accessible.

@tomicapretto Thanks again.

zwelitunyiswa commented 3 years ago

I am not sure if I should close this, or if there will be an attempt to implement @aloctavodia's solution. Let me know if you want me to close out this issue.

aloctavodia commented 3 years ago

@zwelitunyiswa we should leave this open. Thanks for opening this issue.

skulshreshtha commented 2 years ago

We may implement this without changing too much code inside Bambi if we record the missing observation prior to remove them, we then proceed as usual with all the bambi stuff and then after inference we automatically compute and return the posterior predictive distribution for the missing observations.

@zwelitunyiswa yes, that means you can to something like this. Notice also that in the upcoming versions of PyMC3 (V4 and later), PyMC3 will natively (and by default) will use samplers with similar speed ups, as those show in the example notebook. And eventually we will run Bambi on top of that.

@aloctavodia The link you shared does not work anymore. Can you please share again? @zwelitunyiswa Can you share an example of how you used JAX sampling with PYMC3? Thanks in advance!