bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.06k stars 121 forks source link

Conducting more specific comparisons #751

Closed jt-lab closed 7 months ago

jt-lab commented 10 months ago

@GStechschulte,

we keep using your interpret submodule to it's full extent :-D

We came across two types of comparisons we would like to make but are unsure if/how they are possible. So probably this is not really an issue but rather a feature request ...

There are two related (but slightly different) use cases:

1) We would like to conduct a kind of second-level comparison within a factor. I.e., in the example below we would now like to compare the differences of between different levels of factor 1 (see red markings) : image

2) And in a slightly different scenario, we would like to compare the differences of e.g. factor1=A in factor3=1 with factor1=B in factor3=2. That is, the levels would be different and it would be also across the levels of one factor. This might help to clarify: image

Are such comparisons possible somehow? If there was a way to get the predicted samples on which the table is based, we could perhaps calculate these differences ...

As always we are happy about any suggestions / input

Many thanks!

GStechschulte commented 10 months ago

Hey @jt-lab this is great to hear! You are right, this is currently not possible. I once opened PR #708 to return the data used to generate predictions and the arviz inference data all in a pandas dataframe. This would allow you to perform various group by and aggregations (and if you don't like pandas method chaining, I would suggest polars or tidy polars).

I'm not sure why I closed the PR, but I will reopen it and rebase with main.

Initially, I think returning such a data object would make the most sense without introducing a lot of new code. {marginaleffects} allows users to pass there own functions to compute differences, ratios, etc., but even that does not allow you to compute between level and hierarchical comparisons.

This is interesting! While I think of the best solution, I would pull those changes (once I rebase that PR) and compute the desired comparisons yourself. Thanks for communicating with me about this! ๐Ÿ˜„

jt-lab commented 10 months ago

@GStechschulte, that sounds great!

This would allow you to perform various group by and aggregations (and if you don't like pandas method chaining, I would suggest polars or tidy polars).

I have no idea what (tidy) polars are... will google it. But chaining pandas methods sounds good! With many ppc samples working on a pandas df might get slow, but maybe it's not really a problem. Happy to give it a try once the PR is through.

Many thanks!

tomicapretto commented 10 months ago

I just saw tidypolars and I'll drop this here https://tomicapretto.github.io/posts/2022-06-26_tidypolars/ haha

jt-lab commented 9 months ago

@GStechschulte,

We gave it a try with return_idata=True argument. However, the kernel dies due to memory running full. Even with ~100GB swap space it won't run through on our data. When we thin the original idata trace (from inference) to only 10 draws per chain, it runs through (eating up up to 8 GB RAM). Curiously, the returned pandas DataFrame goes up to 3600 draws in the draw column. What is the correspondence between draws in the input idata and the output idata? Do you have any suggestions how to deal with this?

Many thanks!

GStechschulte commented 9 months ago

@jt-lab wow! The number of draws in the draw column of the dataframe depends on the number of chains and number of draws specified when you fit the model. The correspondence between draws and observed data is such that the idata is merged with the observed data that produced that draw. E.g. row 5 of the observed data corresponds to chain 3 draw 500.

We realized the memory limitation (and your comment confirms this) of returning a DataFrame. Therefore, I have been working on returning an az.InferenceData object with a group called observed_data that can store the observed data--the other groups of the object remain the same (posterior, posterior predictive, sample_stats, etc.).

Then, since this object relies on xarray, you can use xarray for multidimensional indexing, aggregations, group by, etc. I should be able to commit some updates in the next day. I also have an example notebook on how you can use xarray to perform the cross comparisons you described above.

jt-lab commented 9 months ago

@GStechschulte,

InferenceData with an additional group sounds great. It could just extends the idata object it receives as input. It's also great for having everything in one place so it can be dumped in to one netcdf file.

You write the group would be called observed_data. Isn't that misleading and rather predicted_observerd_dataor something like this?

Concerning

The number of draws in the draw column of the dataframe depends on the number of chains and number of draws specified when you fit the model.

After thinning the trace to 10 draws per chain, the resulting df was 3600 draws. So apparently it did not consider the current draws in the idata object (but perhaps from some stored value of the original number of draws). Curiously it was so much faster an only used <10% of the memory compared to using the not-thinned idata object. So the thinning did have some effect. Also, the original trace contained 4000 samples... so I'm really not sure where the 3600 comes from ...

As always, many thanks!

GStechschulte commented 9 months ago

@jt-lab Ahhh yes. It is misleading. Thanks! Hmmm. Interesting. Constructing the dataframe required a couple joins so perhaps a bug is being introduced there. However, since I am pretty confident we will be going ahead with the InferenceData object, I won't pursue this (and you shouldn't worry too much about ๐Ÿ˜„)

GStechschulte commented 9 months ago

@jt-lab I have pushed a change to the comparisons function in PR #758 that returns the inference data and data used in model.predict() when return_idata=True. If you pull these changes and then use this Gist you can experiment with how we would construct the inference data with the data and how you can use xarray to compute different comparisons. I included a small demo on computing cross-comparisons ๐Ÿ‘๐Ÿผ

AylinH commented 9 months ago

@GStechschulte thanks a lot for including these changes. That's awesome!

I just tried to do some comparisons with our data and realized that the inference data that is returned contains twice as many observations as the dataset. Also, when I tried to calculate specific comparisons, I noticed that the values in the DataArray are duplicated. In my code, I used average_by with interpret.comparisons and no conditional variables. Maybe something is going wrong when using average_by to return the inference data, or am I missing something?

Many thanks!

GStechschulte commented 9 months ago

@AylinH thanks! Could you share the code and data where this is happening please?

Nonetheless, this behavior can happen. If you fit a model with 100 rows, this is the observed data used to fit the model. However, when calling comparisons (depending on the values you pass for contrast and conditional) a cross join (cartesian product) of the contrast and conditional values are computed which generates a new dataset that is then passed to model.predict().

For example, the cartesian product of two sets A and B results in A x B elements. These elements represent the set of all possible ordered pairs where the first element comes from set A and the second element comes from set B. The length of A x B could be longer than the observed data used to fit the model. Again, it depends on the values passed to conditional, number of covariates in the model, etc.

AylinH commented 9 months ago

@GStechschulte thanks for your quick response!

I used the simulated data by @jt-lab. In the data set, we have two within-participants factors (factor2: X,Y and factor3: 1,2) and one between-participants factor (factor1: A, B, C). Within each level of factor1, there are 9 participants for each respective condition. This is the data set: simulated_data_comparisons.csv

This is the code I used (adapted from your example)

import pandas as pd
import bambi as bmb
import seaborn as sns
import xarray as xr
import numpy as np

data = pd.read_csv('simulated_data_comparisons.csv')

priors = {
    "factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
    "factor1": bmb.Prior("Normal", mu=0, sigma=1),
    "factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}

model = bmb.Model(
    "p(correct, count) ~ 0 + factor3:factor2 + factor1  + (0 + factor3:factor2 | individual)",
    data,
    family="binomial",
    categorical=["factor3", "individual", "factor1", "factor2"],
    priors=priors,
    noncentered=False)

idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
                  target_accept=0.9, idata_kwargs={'log_likelihood':True})

data['report frequency (%)'] = data['correct'] / data['count']

df, idata = bmb.interpret.comparisons(
        model=model, 
        idata=idata, 
        average_by=['factor3', 'factor1'],
        contrast={"factor2": ["X", "Y"]},
        return_idata=True
)

xr_df = xr.Dataset.from_dataframe(df)

del idata.observed_data
idata.add_groups(data=xr_df)
idata.data = idata.data.rename({"index": "p(correct, count)_obs"})

def select_draws(idata, condition_idx):
    draws = idata.posterior.sel(**{"p(correct, count)_obs": condition_idx})["p(correct, count)_mean"]
    draws = draws.assign_coords({"p(correct, count)_obs": np.arange(len(condition_idx))})   

    return draws 

idx_11 = np.where((idata.data.factor2 == "Y") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
idx_41 = np.where((idata.data.factor2 == "X") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]

draws_1 = select_draws(idata, idx_11)
draws_4 = select_draws(idata, idx_41)

diff = (draws_4 - draws_1).mean(("chain", "draw")) 
diff

The returned inference data comprises 216 observations (in the data set there are 108).

This is the output of the comparison:

Comparisons_output

The values in the blue rectangle are the same as the values in the orange rectangle. Also, just based on the data set, I would have expected to see 9 values in the data array. Is this related to your explanation regarding the Cartesian product?

Thank you very much!

AylinH commented 9 months ago

@GStechschulte, for further testing, we have tried to manually recreate a comparison based on the returned idata which is also returned by interpret.comparisons. The mean was as expected, but the HDI boundaries were different. Is this perhaps due to how the random effect 'individual' is handled? (Note: We have removed redundant predictions, see post above.)

grafik

Do you have any idea what might go wrong? Many thanks!

Code:

import pandas as pd
import bambi as bmb
import seaborn as sns
import xarray as xr
import numpy as np
import arviz as az

data = pd.read_csv('simulated_data_comparisons.csv')

priors = {
    "factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
    "factor1": bmb.Prior("Normal", mu=0, sigma=1),
    "factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}

model = bmb.Model(
    "p(correct, count) ~ 0 + factor3:factor2 + factor1  + (0 + factor3:factor2 | individual)",
    data,
    family="binomial",
    categorical=["factor3", "individual", "factor1", "factor2"],
    priors=priors,
    noncentered=False)

idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
                  target_accept=0.9, idata_kwargs={'log_likelihood':True})

data['report frequency (%)'] = data['correct'] / data['count']

df, idata = bmb.interpret.comparisons(
        model=model, 
        idata=idata, 
        average_by=['factor3', 'factor1'],
        contrast={"factor2": ["X", "Y"]},
        return_idata=True
)

#diplay(idata)

xr_df = xr.Dataset.from_dataframe(df)

del idata.observed_data
idata.add_groups(data=xr_df)
idata.data = idata.data.rename({"index": "p(correct, count)_obs"})

def select_draws(idata, condition_idx):
    draws = idata.posterior.sel(**{"p(correct, count)_obs": condition_idx})["p(correct, count)_mean"]
    draws = draws.assign_coords({"p(correct, count)_obs": np.arange(len(condition_idx))})   

    return draws 

subset1 = np.where((idata.data.factor2 == "X") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
subset2 = np.where((idata.data.factor2 == "Y") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]

draws_1 = select_draws(idata, subset1)
draws_2 = select_draws(idata, subset2)

diff = (draws_2 - draws_1).sel(**{"p(correct, count)_obs": range(0,18,2)}).mean("p(correct, count)_obs") ## removing redundant predictions

az.plot_posterior(diff)

summary_df = bmb.interpret.comparisons(
        model=model, 
        idata=idata, 
        average_by=['factor3', 'factor1'],
        contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
GStechschulte commented 9 months ago

@AylinH thank you for the messages and I haven't ignored the first one. I have been busy as I am in between switching jobs. I will get to these messages by the end of the weekend. Thanks! ๐Ÿ‘๐Ÿผ

GStechschulte commented 9 months ago

The values in the blue rectangle are the same as the values in the orange rectangle. Also, just based on the data set, I would have expected to see 9 values in the data array.

@AylinH This is because of a bug in the function that prepares the unit level data. The function was not dropping duplicate rows after creating the unit level data. If you do df.drop_duplicates(inplace=True) after calling comparisons, this will resolve the duplication of elements in the diff array.

I will open a PR to resolve that.

GStechschulte commented 9 months ago

The mean was as expected, but the HDI boundaries were different. Is this perhaps due to how the random effect 'individual' is handled? (Note: We have removed redundant predictions, see post above.)

@AylinH This "discrepancy" is happening because in the summary_df, you are specifying covariates to average by. In the backend, Bambi first computes the comparisons, then applies the average by.

When specifying return_idata=True, the inference data returned by model.predict(data=comparisons_data) is returned. This leaves it up to you to compute the comparisons, then the average by.

So when you selected the draws

diff = (draws_2 - draws_1).sel(**{"p(correct, count)_obs": range(0,18,2)}).mean("p(correct, count)_obs")

you were only selecting a subset of the draws that corresponded to the following data:

factor1 factor3 individual factor2
A 1 18 Y
A 1 20 Y
A 1 22 Y
A 1 24 Y
A 1 26 Y
A 1 18 X
A 1 20 X
A 1 22 X
A 1 24 X
A 1 26 X

and without performing a group by aggregation. So you should first compute unit level comparisons with all draws and then apply your group by aggregation.

Good questions, thanks! By the way, I am adding some helper functions to make a lot of this easier. I hope to have a draft PR by the end of the weekend.

AylinH commented 9 months ago

@GStechschulte thank you very much for your explanations!

We have tried to calculate the comparisons as you have explained. However, our HDI boundaries have not changed from our previous post and are still different from the summary_df. Maybe our unit level comparison is not what you meant or the random effect "individual" is handled differently in the summary_df?

This is our code (with comments for clarification):

import bambi as bmb
import pandas as pd
import xarray as xr
import numpy as np
import arviz as az

data = pd.read_csv('simulated_data_comparisons.csv')

priors = {
    "factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
    "factor1": bmb.Prior("Normal", mu=0, sigma=1),
    "factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}

model = bmb.Model(
    "p(correct, count) ~ 0 + factor3:factor2 + factor1  + (0 + factor3:factor2 | individual)",
    data,
    family="binomial",
    categorical=["factor3", "individual", "factor1", "factor2"],
    priors=priors,
    noncentered=False)

idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
                  target_accept=0.9, idata_kwargs={'log_likelihood':True})

df, idata_pred = bmb.interpret.comparisons(
        model=model, 
        idata=idata, 
        average_by=['factor3', 'factor1'],
        contrast={"factor2": ["X", "Y"]},
        return_idata=True
)

xr_df = xr.Dataset.from_dataframe(df)
del idata_pred.observed_data
idata_pred.add_groups(data=xr_df)
idata_pred.data = idata_pred.data.rename({"index": "p(correct, count)_obs"})

mask_X = (df["factor2"]=="X")
values_X = idata_pred.posterior["p(correct, count)_mean"][:,:, mask_X.values].values

mask_Y = (df["factor2"]=="Y")
values_Y = idata_pred.posterior["p(correct, count)_mean"][:,:, mask_Y.values].values

diff = values_Y - values_X # Difference on the unit-level

diff_subset = diff[:,:,((df["factor1"]=="A") & (df["factor3"]==1)).values[mask_X]] # mask_X indices in valid order for this
mean_diff = np.mean(diff_subset, axis=2) # Mean difference over units

summary_df = bmb.interpret.comparisons(
        model=model, 
        idata=idata_pred, 
        average_by=['factor3', 'factor1'],
        contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
az.plot_posterior(mean_diff)

grafik

We are happy about any suggestions. Many thanks!

AylinH commented 9 months ago

@GStechschulte that is awesome, many thanks for working on these modifications!

I have just calculated our comparison using your new approach, which worked well. However, the HDI discrepancy is still there (the same HDIs as for the above codes). Could you please explain to us which HDIs are the correct ones? Any help would be greatly appreciated.

This is the code:

import warnings
import arviz as az
import numpy as np
import pandas as pd
import xarray as xr
import bambi as bmb
from bambi.interpret.helpers import data_grid, select_draws

bmb.config["INTERPRET_VERBOSE"] = False
warnings.simplefilter(action='ignore', category=FutureWarning)

data = pd.read_csv('simulated_data_comparisons.csv')

priors = {
    "factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
    "factor1": bmb.Prior("Normal", mu=0, sigma=1),
    "factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}

model = bmb.Model(
    "p(correct, count) ~ 0 + factor3:factor2 + factor1  + (0 + factor3:factor2 | individual)",
    data,
    family="binomial",
    categorical=["factor3", "individual", "factor1", "factor2"],
    priors=priors,
    noncentered=False)

idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
                  target_accept=0.9, idata_kwargs={'log_likelihood':True})

# data grid
conditional = {
    "factor1": np.array(["A"]), # Between-individuals factor
    "factor3": np.array([1]),
    "individual": np.arange(18, 27) # All individuals who are in factor1=A
}
variable = {"factor2": np.array(["X", "Y"])}

new_data = data_grid(model, conditional, variable)
idata_new = model.predict(idata, data=new_data, inplace=False)

# comparison
draw_1 = select_draws(idata_new, new_data, {"factor2": "X"}, "p(correct, count)_mean")
draw_2 = select_draws(idata_new, new_data, {"factor2": "Y"}, "p(correct, count)_mean")

diff = (draw_2 - draw_1).mean("p(correct, count)_obs")

summary_df = bmb.interpret.comparisons(
        model=model, 
        idata=idata, 
        average_by=['factor3', 'factor1'],
        contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
az.plot_posterior(diff)

grafik

Data: simulated_data_comparisons.csv

Also, we were wondering if it is possible to specify the required contrast directly in the data_grid without having to specify it again with select_draws? I don't think I understand why we then have to specify conditional and variable separately in the data_grid. Maybe you could explain a little more how the data_grid works?

Thanks a lot!

GStechschulte commented 9 months ago

the HDI discrepancy is still there (the same HDIs as for the above codes). Could you please explain to us which HDIs are the correct ones? Any help would be greatly appreciated.

They are both returning the correct value. It is just that you are taking the mean along different dimensions and then passing that to az.hdi to compute statistics of the posterior samples.

Usually, when analyzing the posterior draws, you want to reduce the chain and draw dimensions first via .mean(("chain", "draw")) and then perform further aggregations if desired. You are reducing the following dimension first .mean("p(correct, count)_obs"). In the interpret module, we reduce by chains and draws first.

ArviZ has a nice post about working the InferenceData object. Additionally, see this PyMC forum question. ๐Ÿ‘๐Ÿผ

Adapting your example:

# same value as estimate column in 'summary_df'
diff = (draw_2 - draw_1).mean(("chain", "draw"))
np.mean(diff)
array(0.01249778)

Computing the 94% credible interval

# same values as CI columns in 'summary_df'
lower_mean = np.mean(az.hdi((draw_2 - draw_1)).sel(hdi='lower')["p(correct, count)_mean"])
higher_mean = np.mean(az.hdi((draw_2 - draw_1)).sel(hdi='higher')["p(correct, count)_mean"])
lower_mean, higher_mean
array(-0.03360939), array(0.05940723)

and comparing these results with the summary dataframe:

summary_df = bmb.interpret.comparisons(
        model=model, 
        idata=idata, 
        contrast={"factor2": ["X", "Y"]},
        conditional=conditional,
        average_by=['factor3', 'factor1'],
        prob=0.94,
        use_hdi=True
)
summary_df
term estimate_type value factor3 factor1 estimate lower_3.0% upper_97.0%
factor2 diff (X, Y) 1 A 0.012498 -0.033609 0.059407
jt-lab commented 9 months ago

@GStechschulte, thanks for the further explanations and all the effort. I think I now got what the different variant represent:

The summary_df returned by interpret.comparisons contains the means of the unit-level mean differences and the means of the unit-level hdi boundaries.

What we calculated manually was the mean over all unit-level distributions and the mean and hdis of the resulting distribution.

I think the crucial point is that we were unaware of this:

Usually, when analyzing the posterior draws, you want to reduce the chain and draw dimensions first

So far we have mostly worked with models where the parameter posteriors can be interpreted more directly without going through predictions. And when working with posteriors it's always "summarize last" (e.g. take differences and means first, then get mean/hdis to summarize the distribution). I must say it's not really clear to me why when working with posterior prediction draws it is to "reduce first". Our goal is to make inferences based on the comparisons (that optimally generalize to the population). I looked into some of the literature listed on the {marginal effects} page but so far this was not super helpful. Hence, if anyone has suggestion for background on this ...

Edit: Just found the reference to the Gelman, Hill, Vehtari Regression book in the one of the example notebooks... will have a look at that!

many thanks again for the support!

GStechschulte commented 9 months ago

The summary_df returned by interpret.comparisons contains the means of the unit-level mean differences and the means of the unit-level hdi boundaries.

@jt-lab yes, but this is only because we computed unit-level comparisons and passed an argument to average_by. If we had left average by to None, then the comparison would be

# mean comparison
diff = (draw_2 - draw_1).mean(("chain", "draw"))

# lower and higher bound of that comparison
lower_mean = az.hdi((draw_2 - draw_1)).sel(hdi='lower')["p(correct, count)_mean"]
higher_mean = az.hdi((draw_2 - draw_1)).sel(hdi='higher')["p(correct, count)_mean"]

I must say it's not really clear to me why when working with posterior prediction draws it is to "reduce first". Our goal is to make inferences based on the comparisons (that optimally generalize to the population)

Since your model has categorical variables and interaction effects, the dimensionality of the posterior is quite large:

idata_new["posterior"].dims
Frozen({'chain': 4, 'draw': 2000, 'factor3:factor2_dim': 4, 'factor1_dim': 2, 'individual__factor_dim': 27, 'factor3:factor2__expr_dim': 4, 'p(correct, count)_obs': 18})

but lets imagine for a second that it is

'chain': 4, 'draw': 2000, 'p(correct, count)_obs': 18

The coordinate 'p(correct, count)_obs' value of 18 corresponds to the number of units (rows) in new_data. What we are usually interested in is the mean and uncertainty of the predictions for each unit $x_i$. To achieve that, we need to do

(draw_2 - draw_1).mean(("chain", "draw")) 
# and
az.hdi((draw_2 - draw_1))

which gives us the mean and uncertainty for each unit $x_i$ (after select a subset of the draws using select_draws). Which indeed does have a shape of

(draw_2 - draw_1).mean(("chain", "draw")).shape, az.hdi((draw_2 - draw_1)).shape
(9,)

I think it also partly depends on what type of posterior summary you are wanting to compute. But in the case of comparisons, slopes, and predictions in Bambi, we are interested in the posterior summary described above. @tomicapretto do you have any comments here? Perhaps you have computed other posterior summaries and or can link to a blog / tutorial if you know of any.

jt-lab commented 9 months ago

I think it also partly depends on what type of posterior summary you are wanting to compute.ย 

Yes, I think that's the crucial question. But I'd say what we want is the use case for 99% of users from our field (and probably all of behavioral sciences) would have: We run some experiment (or observed some data in other fields) and now are interested in which factors matter and how they interact, based on all the units that were tested (and probably generalizing to the population). In a frequentist setting we would look at which factors are significant. In Bayesian model comparison frameworks we might look at tinclusion Bayes Factors for different components. In Bayesian parameter estimation, we can often look directly at the parameter posteriors. But in GLMM, that's difficult and I think the prediction way can help with this. So this the background why we are interested in having a 'global' statistic for certain comparisons to look at.

tomicapretto commented 9 months ago

I've been trying to follow the conversation. Initially I thought we were doing something wrong in the interpret submodule as averaging lower/upper bounds seemed strange to me. However, the results match the ones I get with marginaleffects (tested with other models). It looks like I still need to review some things.

In the meantime, I think using data_grid, .predict, and select_draws is a good alternative if you want to perform more customized comparisons and/or aggregations.

jt-lab commented 9 months ago

Hi all,

I just came across these super helpful blog posts by Andrew Heiss. You probably know them, but just in case:

[1] https://www.andrewheiss.com/blog/2022/11/29/conditional-marginal-marginaleffects/ [2] https://www.andrewheiss.com/blog/2021/11/10/ame-bayes-re-guide

I think our current approach is going for average population-level outcomes in the way in the "average / marginalize / integrate across existing random effects", see [1].

While reading these blog post I have been wondering again who Bambi handles the random effects in predictions. Tee R module there is the re_formula argument that allows user to specify what to do with REs. But perhaps Bambi's predict implicitely decides this based on whether new groups are predicted, is the RE factor is in the average_by, etc. But clarification would be great!

GStechschulte commented 9 months ago

@jt-lab yes, these blogs are fantastic. I am re-reading them today.

But perhaps Bambi's predict implicitly decides this based on whether new groups are predicted, is the RE factor is in the average_by, etc. But clarification would be great.

We have recently added docs on predicting new groups when specifying group effects in a Bambi model. This notebook only describes how this is achieved in model.predict (and subsequently how it can be used in the interpret module).

GStechschulte commented 9 months ago

@jt-lab @AylinH I have reviewed the blog posts. In the blog Marginal and Conditional Effects for GLMMs there are two quantities of interest:

  1. Conditional effects - these are predictions when you "set the random offsets to 0", i.e. you do not include group specific effects. These predictions can be achieved by passing False to the include_group_specific parameter of model.predict
  2. Average population level outcomes - these are predictions that include group effects. This is the default behavior of the predict method, i.e. include_group_specific = True

In your case, I would:

  1. use data_grid to create a grid of data that would generate predictions for each desired factor within individual (this is the "calculate predictions for TX = {0, 1} within each of the existing clusters" sentence) with include_group_specific = True.
  2. use select_draws to select draws that correspond to the desired contrast values, e.g. {factor2: [X, Y]}
  3. compute the average for each contrast
GStechschulte commented 9 months ago

Hey @jt-lab and @AylinH PR #762 has been merged. Also, does the above message make sense?

jt-lab commented 9 months ago

@GStechschulte, sorry for the slow response and thanks for explanations and the notebook. Yes, that makes sense and agrees with what we are currently doing. So definitely reassuring to see we converged on the same calculation! We are still struggeling a bit with achieving the same thing for predicted slopes. The current idea is to use the average population level outcomes but instead of calculating a single difference within each cluster, we would calculate level-to-level differences and average them inside the clusters (and then average as you described above).