Closed jt-lab closed 9 months ago
Hey @jt-lab this is great to hear! You are right, this is currently not possible. I once opened PR #708 to return the data used to generate predictions and the arviz inference data all in a pandas dataframe. This would allow you to perform various group by and aggregations (and if you don't like pandas method chaining, I would suggest polars or tidy polars).
I'm not sure why I closed the PR, but I will reopen it and rebase with main.
Initially, I think returning such a data object would make the most sense without introducing a lot of new code. {marginaleffects} allows users to pass there own functions to compute differences, ratios, etc., but even that does not allow you to compute between level and hierarchical comparisons.
This is interesting! While I think of the best solution, I would pull those changes (once I rebase that PR) and compute the desired comparisons yourself. Thanks for communicating with me about this! ๐
@GStechschulte, that sounds great!
This would allow you to perform various group by and aggregations (and if you don't like pandas method chaining, I would suggest polars or tidy polars).
I have no idea what (tidy) polars are... will google it. But chaining pandas methods sounds good! With many ppc samples working on a pandas df might get slow, but maybe it's not really a problem. Happy to give it a try once the PR is through.
Many thanks!
I just saw tidypolars and I'll drop this here https://tomicapretto.github.io/posts/2022-06-26_tidypolars/ haha
@GStechschulte,
We gave it a try with return_idata=True argument. However, the kernel dies due to memory running full. Even with ~100GB swap space it won't run through on our data. When we thin the original idata trace (from inference) to only 10 draws per chain, it runs through (eating up up to 8 GB RAM). Curiously, the returned pandas DataFrame goes up to 3600 draws in the draw column. What is the correspondence between draws in the input idata and the output idata? Do you have any suggestions how to deal with this?
Many thanks!
@jt-lab wow! The number of draws in the draw column of the dataframe depends on the number of chains and number of draws specified when you fit the model. The correspondence between draws and observed data is such that the idata is merged with the observed data that produced that draw. E.g. row 5 of the observed data corresponds to chain 3 draw 500.
We realized the memory limitation (and your comment confirms this) of returning a DataFrame. Therefore, I have been working on returning an az.InferenceData
object with a group called observed_data
that can store the observed data--the other groups of the object remain the same (posterior, posterior predictive, sample_stats, etc.).
Then, since this object relies on xarray, you can use xarray for multidimensional indexing, aggregations, group by, etc. I should be able to commit some updates in the next day. I also have an example notebook on how you can use xarray to perform the cross comparisons you described above.
@GStechschulte,
InferenceData
with an additional group sounds great. It could just extends the idata object it receives as input. It's also great for having everything in one place so it can be dumped in to one netcdf file.
You write the group would be called observed_data
. Isn't that misleading and rather predicted_observerd_data
or something like this?
Concerning
The number of draws in the draw column of the dataframe depends on the number of chains and number of draws specified when you fit the model.
After thinning the trace to 10 draws per chain, the resulting df was 3600 draws. So apparently it did not consider the current draws in the idata object (but perhaps from some stored value of the original number of draws). Curiously it was so much faster an only used <10% of the memory compared to using the not-thinned idata object. So the thinning did have some effect. Also, the original trace contained 4000 samples... so I'm really not sure where the 3600 comes from ...
As always, many thanks!
@jt-lab Ahhh yes. It is misleading. Thanks! Hmmm. Interesting. Constructing the dataframe required a couple joins so perhaps a bug is being introduced there. However, since I am pretty confident we will be going ahead with the InferenceData object, I won't pursue this (and you shouldn't worry too much about ๐)
@jt-lab I have pushed a change to the comparisons
function in PR #758 that returns the inference data and data used in model.predict()
when return_idata=True
. If you pull these changes and then use this Gist you can experiment with how we would construct the inference data with the data and how you can use xarray to compute different comparisons. I included a small demo on computing cross-comparisons ๐๐ผ
@GStechschulte thanks a lot for including these changes. That's awesome!
I just tried to do some comparisons with our data and realized that the inference data that is returned contains twice as many observations as the dataset. Also, when I tried to calculate specific comparisons, I noticed that the values in the DataArray are duplicated. In my code, I used average_by with interpret.comparisons and no conditional variables. Maybe something is going wrong when using average_by to return the inference data, or am I missing something?
Many thanks!
@AylinH thanks! Could you share the code and data where this is happening please?
Nonetheless, this behavior can happen. If you fit a model with 100 rows, this is the observed data used to fit the model. However, when calling comparisons
(depending on the values you pass for contrast
and conditional
) a cross join (cartesian product) of the contrast and conditional values are computed which generates a new dataset that is then passed to model.predict()
.
For example, the cartesian product of two sets A
and B
results in A x B
elements. These elements represent the set of all possible ordered pairs where the first element comes from set A
and the second element comes from set B
. The length of A x B
could be longer than the observed data used to fit the model. Again, it depends on the values passed to conditional
, number of covariates in the model, etc.
@GStechschulte thanks for your quick response!
I used the simulated data by @jt-lab. In the data set, we have two within-participants factors (factor2: X,Y and factor3: 1,2) and one between-participants factor (factor1: A, B, C). Within each level of factor1, there are 9 participants for each respective condition. This is the data set: simulated_data_comparisons.csv
This is the code I used (adapted from your example)
import pandas as pd
import bambi as bmb
import seaborn as sns
import xarray as xr
import numpy as np
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
data['report frequency (%)'] = data['correct'] / data['count']
df, idata = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
return_idata=True
)
xr_df = xr.Dataset.from_dataframe(df)
del idata.observed_data
idata.add_groups(data=xr_df)
idata.data = idata.data.rename({"index": "p(correct, count)_obs"})
def select_draws(idata, condition_idx):
draws = idata.posterior.sel(**{"p(correct, count)_obs": condition_idx})["p(correct, count)_mean"]
draws = draws.assign_coords({"p(correct, count)_obs": np.arange(len(condition_idx))})
return draws
idx_11 = np.where((idata.data.factor2 == "Y") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
idx_41 = np.where((idata.data.factor2 == "X") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
draws_1 = select_draws(idata, idx_11)
draws_4 = select_draws(idata, idx_41)
diff = (draws_4 - draws_1).mean(("chain", "draw"))
diff
The returned inference data comprises 216 observations (in the data set there are 108).
This is the output of the comparison:
The values in the blue rectangle are the same as the values in the orange rectangle. Also, just based on the data set, I would have expected to see 9 values in the data array. Is this related to your explanation regarding the Cartesian product?
Thank you very much!
@GStechschulte, for further testing, we have tried to manually recreate a comparison based on the returned idata which is also returned by interpret.comparisons. The mean was as expected, but the HDI boundaries were different. Is this perhaps due to how the random effect 'individual' is handled? (Note: We have removed redundant predictions, see post above.)
Do you have any idea what might go wrong? Many thanks!
Code:
import pandas as pd
import bambi as bmb
import seaborn as sns
import xarray as xr
import numpy as np
import arviz as az
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
data['report frequency (%)'] = data['correct'] / data['count']
df, idata = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
return_idata=True
)
#diplay(idata)
xr_df = xr.Dataset.from_dataframe(df)
del idata.observed_data
idata.add_groups(data=xr_df)
idata.data = idata.data.rename({"index": "p(correct, count)_obs"})
def select_draws(idata, condition_idx):
draws = idata.posterior.sel(**{"p(correct, count)_obs": condition_idx})["p(correct, count)_mean"]
draws = draws.assign_coords({"p(correct, count)_obs": np.arange(len(condition_idx))})
return draws
subset1 = np.where((idata.data.factor2 == "X") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
subset2 = np.where((idata.data.factor2 == "Y") & (idata.data.factor1 == "A") & (idata.data.factor3 == 1))[0]
draws_1 = select_draws(idata, subset1)
draws_2 = select_draws(idata, subset2)
diff = (draws_2 - draws_1).sel(**{"p(correct, count)_obs": range(0,18,2)}).mean("p(correct, count)_obs") ## removing redundant predictions
az.plot_posterior(diff)
summary_df = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
@AylinH thank you for the messages and I haven't ignored the first one. I have been busy as I am in between switching jobs. I will get to these messages by the end of the weekend. Thanks! ๐๐ผ
The values in the blue rectangle are the same as the values in the orange rectangle. Also, just based on the data set, I would have expected to see 9 values in the data array.
@AylinH This is because of a bug in the function that prepares the unit level data. The function was not dropping duplicate rows after creating the unit level data. If you do df.drop_duplicates(inplace=True)
after calling comparisons
, this will resolve the duplication of elements in the diff
array.
I will open a PR to resolve that.
The mean was as expected, but the HDI boundaries were different. Is this perhaps due to how the random effect 'individual' is handled? (Note: We have removed redundant predictions, see post above.)
@AylinH This "discrepancy" is happening because in the summary_df
, you are specifying covariates to average by. In the backend, Bambi first computes the comparisons, then applies the average by.
When specifying return_idata=True
, the inference data returned by model.predict(data=comparisons_data)
is returned. This leaves it up to you to compute the comparisons, then the average by.
So when you selected the draws
diff = (draws_2 - draws_1).sel(**{"p(correct, count)_obs": range(0,18,2)}).mean("p(correct, count)_obs")
you were only selecting a subset of the draws that corresponded to the following data:
factor1 | factor3 | individual | factor2 |
---|---|---|---|
A | 1 | 18 | Y |
A | 1 | 20 | Y |
A | 1 | 22 | Y |
A | 1 | 24 | Y |
A | 1 | 26 | Y |
A | 1 | 18 | X |
A | 1 | 20 | X |
A | 1 | 22 | X |
A | 1 | 24 | X |
A | 1 | 26 | X |
and without performing a group by aggregation. So you should first compute unit level comparisons with all draws and then apply your group by aggregation.
Good questions, thanks! By the way, I am adding some helper functions to make a lot of this easier. I hope to have a draft PR by the end of the weekend.
@GStechschulte thank you very much for your explanations!
We have tried to calculate the comparisons as you have explained. However, our HDI boundaries have not changed from our previous post and are still different from the summary_df
. Maybe our unit level comparison is not what you meant or the random effect "individual" is handled differently in the summary_df
?
This is our code (with comments for clarification):
import bambi as bmb
import pandas as pd
import xarray as xr
import numpy as np
import arviz as az
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
df, idata_pred = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
return_idata=True
)
xr_df = xr.Dataset.from_dataframe(df)
del idata_pred.observed_data
idata_pred.add_groups(data=xr_df)
idata_pred.data = idata_pred.data.rename({"index": "p(correct, count)_obs"})
mask_X = (df["factor2"]=="X")
values_X = idata_pred.posterior["p(correct, count)_mean"][:,:, mask_X.values].values
mask_Y = (df["factor2"]=="Y")
values_Y = idata_pred.posterior["p(correct, count)_mean"][:,:, mask_Y.values].values
diff = values_Y - values_X # Difference on the unit-level
diff_subset = diff[:,:,((df["factor1"]=="A") & (df["factor3"]==1)).values[mask_X]] # mask_X indices in valid order for this
mean_diff = np.mean(diff_subset, axis=2) # Mean difference over units
summary_df = bmb.interpret.comparisons(
model=model,
idata=idata_pred,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
az.plot_posterior(mean_diff)
We are happy about any suggestions. Many thanks!
@GStechschulte that is awesome, many thanks for working on these modifications!
I have just calculated our comparison using your new approach, which worked well. However, the HDI discrepancy is still there (the same HDIs as for the above codes). Could you please explain to us which HDIs are the correct ones? Any help would be greatly appreciated.
This is the code:
import warnings
import arviz as az
import numpy as np
import pandas as pd
import xarray as xr
import bambi as bmb
from bambi.interpret.helpers import data_grid, select_draws
bmb.config["INTERPRET_VERBOSE"] = False
warnings.simplefilter(action='ignore', category=FutureWarning)
data = pd.read_csv('simulated_data_comparisons.csv')
priors = {
"factor3:factor2": bmb.Prior("Normal", mu=0, sigma=1),
"factor1": bmb.Prior("Normal", mu=0, sigma=1),
"factor3:factor2|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
# data grid
conditional = {
"factor1": np.array(["A"]), # Between-individuals factor
"factor3": np.array([1]),
"individual": np.arange(18, 27) # All individuals who are in factor1=A
}
variable = {"factor2": np.array(["X", "Y"])}
new_data = data_grid(model, conditional, variable)
idata_new = model.predict(idata, data=new_data, inplace=False)
# comparison
draw_1 = select_draws(idata_new, new_data, {"factor2": "X"}, "p(correct, count)_mean")
draw_2 = select_draws(idata_new, new_data, {"factor2": "Y"}, "p(correct, count)_mean")
diff = (draw_2 - draw_1).mean("p(correct, count)_obs")
summary_df = bmb.interpret.comparisons(
model=model,
idata=idata,
average_by=['factor3', 'factor1'],
contrast={"factor2": ["X", "Y"]},
)
display(summary_df)
az.plot_posterior(diff)
Data: simulated_data_comparisons.csv
Also, we were wondering if it is possible to specify the required contrast directly in the data_grid
without having to specify it again with select_draws
? I don't think I understand why we then have to specify conditional
and variable
separately in the data_grid
. Maybe you could explain a little more how the data_grid
works?
Thanks a lot!
the HDI discrepancy is still there (the same HDIs as for the above codes). Could you please explain to us which HDIs are the correct ones? Any help would be greatly appreciated.
They are both returning the correct value. It is just that you are taking the mean along different dimensions and then passing that to az.hdi
to compute statistics of the posterior samples.
Usually, when analyzing the posterior draws, you want to reduce the chain and draw dimensions first via .mean(("chain", "draw"))
and then perform further aggregations if desired. You are reducing the following dimension first .mean("p(correct, count)_obs")
. In the interpret
module, we reduce by chains and draws first.
ArviZ has a nice post about working the InferenceData object. Additionally, see this PyMC forum question. ๐๐ผ
Adapting your example:
# same value as estimate column in 'summary_df'
diff = (draw_2 - draw_1).mean(("chain", "draw"))
np.mean(diff)
array(0.01249778)
Computing the 94% credible interval
# same values as CI columns in 'summary_df'
lower_mean = np.mean(az.hdi((draw_2 - draw_1)).sel(hdi='lower')["p(correct, count)_mean"])
higher_mean = np.mean(az.hdi((draw_2 - draw_1)).sel(hdi='higher')["p(correct, count)_mean"])
lower_mean, higher_mean
array(-0.03360939), array(0.05940723)
and comparing these results with the summary dataframe:
summary_df = bmb.interpret.comparisons(
model=model,
idata=idata,
contrast={"factor2": ["X", "Y"]},
conditional=conditional,
average_by=['factor3', 'factor1'],
prob=0.94,
use_hdi=True
)
summary_df
term | estimate_type | value | factor3 | factor1 | estimate | lower_3.0% | upper_97.0% |
---|---|---|---|---|---|---|---|
factor2 | diff | (X, Y) | 1 | A | 0.012498 | -0.033609 | 0.059407 |
@GStechschulte, thanks for the further explanations and all the effort. I think I now got what the different variant represent:
The summary_df
returned by interpret.comparisons
contains the means of the unit-level mean differences and the means of the unit-level hdi boundaries.
What we calculated manually was the mean over all unit-level distributions and the mean and hdis of the resulting distribution.
I think the crucial point is that we were unaware of this:
Usually, when analyzing the posterior draws, you want to reduce the chain and draw dimensions first
So far we have mostly worked with models where the parameter posteriors can be interpreted more directly without going through predictions. And when working with posteriors it's always "summarize last" (e.g. take differences and means first, then get mean/hdis to summarize the distribution). I must say it's not really clear to me why when working with posterior prediction draws it is to "reduce first". Our goal is to make inferences based on the comparisons (that optimally generalize to the population). I looked into some of the literature listed on the {marginal effects} page but so far this was not super helpful. Hence, if anyone has suggestion for background on this ...
Edit: Just found the reference to the Gelman, Hill, Vehtari Regression book in the one of the example notebooks... will have a look at that!
many thanks again for the support!
The summary_df returned by interpret.comparisons contains the means of the unit-level mean differences and the means of the unit-level hdi boundaries.
@jt-lab yes, but this is only because we computed unit-level comparisons and passed an argument to average_by
. If we had left average by to None
, then the comparison would be
# mean comparison
diff = (draw_2 - draw_1).mean(("chain", "draw"))
# lower and higher bound of that comparison
lower_mean = az.hdi((draw_2 - draw_1)).sel(hdi='lower')["p(correct, count)_mean"]
higher_mean = az.hdi((draw_2 - draw_1)).sel(hdi='higher')["p(correct, count)_mean"]
I must say it's not really clear to me why when working with posterior prediction draws it is to "reduce first". Our goal is to make inferences based on the comparisons (that optimally generalize to the population)
Since your model has categorical variables and interaction effects, the dimensionality of the posterior is quite large:
idata_new["posterior"].dims
Frozen({'chain': 4, 'draw': 2000, 'factor3:factor2_dim': 4, 'factor1_dim': 2, 'individual__factor_dim': 27, 'factor3:factor2__expr_dim': 4, 'p(correct, count)_obs': 18})
but lets imagine for a second that it is
'chain': 4, 'draw': 2000, 'p(correct, count)_obs': 18
The coordinate 'p(correct, count)_obs'
value of 18
corresponds to the number of units (rows) in new_data
. What we are usually interested in is the mean and uncertainty of the predictions for each unit $x_i$. To achieve that, we need to do
(draw_2 - draw_1).mean(("chain", "draw"))
# and
az.hdi((draw_2 - draw_1))
which gives us the mean and uncertainty for each unit $x_i$ (after select a subset of the draws using select_draws
). Which indeed does have a shape of
(draw_2 - draw_1).mean(("chain", "draw")).shape, az.hdi((draw_2 - draw_1)).shape
(9,)
I think it also partly depends on what type of posterior summary you are wanting to compute. But in the case of comparisons
, slopes
, and predictions
in Bambi, we are interested in the posterior summary described above. @tomicapretto do you have any comments here? Perhaps you have computed other posterior summaries and or can link to a blog / tutorial if you know of any.
I think it also partly depends on what type of posterior summary you are wanting to compute.ย
Yes, I think that's the crucial question. But I'd say what we want is the use case for 99% of users from our field (and probably all of behavioral sciences) would have: We run some experiment (or observed some data in other fields) and now are interested in which factors matter and how they interact, based on all the units that were tested (and probably generalizing to the population). In a frequentist setting we would look at which factors are significant. In Bayesian model comparison frameworks we might look at tinclusion Bayes Factors for different components. In Bayesian parameter estimation, we can often look directly at the parameter posteriors. But in GLMM, that's difficult and I think the prediction way can help with this. So this the background why we are interested in having a 'global' statistic for certain comparisons to look at.
I've been trying to follow the conversation. Initially I thought we were doing something wrong in the interpret
submodule as averaging lower/upper bounds seemed strange to me. However, the results match the ones I get with marginaleffects
(tested with other models). It looks like I still need to review some things.
In the meantime, I think using data_grid
, .predict
, and select_draws
is a good alternative if you want to perform more customized comparisons and/or aggregations.
Hi all,
I just came across these super helpful blog posts by Andrew Heiss. You probably know them, but just in case:
[1] https://www.andrewheiss.com/blog/2022/11/29/conditional-marginal-marginaleffects/ [2] https://www.andrewheiss.com/blog/2021/11/10/ame-bayes-re-guide
I think our current approach is going for average population-level outcomes in the way in the "average / marginalize / integrate across existing random effects", see [1].
While reading these blog post I have been wondering again who Bambi handles the random effects in predictions. Tee R module there is the re_formula argument that allows user to specify what to do with REs. But perhaps Bambi's predict implicitely decides this based on whether new groups are predicted, is the RE factor is in the average_by, etc. But clarification would be great!
@jt-lab yes, these blogs are fantastic. I am re-reading them today.
But perhaps Bambi's predict implicitly decides this based on whether new groups are predicted, is the RE factor is in the average_by, etc. But clarification would be great.
We have recently added docs on predicting new groups when specifying group effects in a Bambi model. This notebook only describes how this is achieved in model.predict
(and subsequently how it can be used in the interpret
module).
@jt-lab @AylinH I have reviewed the blog posts. In the blog Marginal and Conditional Effects for GLMMs there are two quantities of interest:
False
to the include_group_specific
parameter of model.predictinclude_group_specific = True
In your case, I would:
data_grid
to create a grid of data that would generate predictions for each desired factor
within individual
(this is the "calculate predictions for TX = {0, 1} within each of the existing clusters" sentence) with include_group_specific = True
.select_draws
to select draws that correspond to the desired contrast values, e.g. {factor2: [X, Y]}
Hey @jt-lab and @AylinH PR #762 has been merged. Also, does the above message make sense?
@GStechschulte, sorry for the slow response and thanks for explanations and the notebook. Yes, that makes sense and agrees with what we are currently doing. So definitely reassuring to see we converged on the same calculation! We are still struggeling a bit with achieving the same thing for predicted slopes. The current idea is to use the average population level outcomes but instead of calculating a single difference within each cluster, we would calculate level-to-level differences and average them inside the clusters (and then average as you described above).
@GStechschulte,
we keep using your interpret submodule to it's full extent :-D
We came across two types of comparisons we would like to make but are unsure if/how they are possible. So probably this is not really an issue but rather a feature request ...
There are two related (but slightly different) use cases:
1) We would like to conduct a kind of second-level comparison within a factor. I.e., in the example below we would now like to compare the differences of between different levels of factor 1 (see red markings) :
2) And in a slightly different scenario, we would like to compare the differences of e.g. factor1=A in factor3=1 with factor1=B in factor3=2. That is, the levels would be different and it would be also across the levels of one factor. This might help to clarify:
Are such comparisons possible somehow? If there was a way to get the predicted samples on which the table is based, we could perhaps calculate these differences ...
As always we are happy about any suggestions / input
Many thanks!