bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.06k stars 122 forks source link

Setting alias to random intercept of the parent parameter unexpectedly changes the name of the factor dimension #827

Open digicosmos86 opened 2 months ago

digicosmos86 commented 2 months ago

Thank you so much, the Bambi team, for the latest version! The graphs look much cleaner now. There is one small inconsistency, though. For example: I have a model built in Bambi with the following graph (v is the parent parameter here):

image

I use model.set_alias({ "v": { "1|participant_id": "v_1|participant_id" } })

After rebuilding the model, the graph looks like this:

image

Note the new plate v_1|participant_id__factor_dim. It seems that this should not be changed with aliasing like this. The corresponding dimension in the InferenceData that bambi creates also changes, creating a bit of an inconsistency here

tomicapretto commented 1 month ago

I see what you mean. I'm not sure the solution would be easy because there's some inconsistency here (probably with what we've done in Bambi)

The terms for random effects are made of two parts: the formula, and the grouping variable. The first goes on the LHS of | and the other goes on the RHS. For example, 1 | participant_id. The formula specifies just an intercept, and the grouping variable (or factor) is participant_id. Every participant has their own intercept, partially pooled.

In that case, the random effects term results in a single new dimension added to the model (a dimension with as many levels as participants). But there can be cases where the formula on the LHS contains a categorical variable, and thus requires one new dimension.

We decided to separate the dimensions into expr for the ones from the LHS and factor for the ones on the RHS. The first one is affected by the name on the LHS, the second by the name of the RHS.

For example, if we have y ~ 0 + (0 + condition | subject) and condition is a categorical variable, we will have dimensions condition__expr_dim and subject__factor_dim, and the plate for condition | subject will show a shape of condition__expr_dim x subject__factor_dim.

The problem with the aliases, is that we don't rename the LHS or the RHS independently. We rename the entire term. And since Bambi "loses" that information, the "safe" approach it takes is to just create a new dimension as if they were different things.

I'm not sure what is the best solution here. One solution would be to rename the LHS and RHS of | independently, but then the aliases force you to keep the |, and we added aliases to allow people to put whatever they want. Another solution would be to try to determine if the user renamed the LHS but not the RHS and then keep the same dimension for the grouping variable. But these things are not trivial as they require surgical precision with the modifications to not break other things.

digicosmos86 commented 1 month ago

Thank you for looking into this, @tomicapretto! I wonder if there's some native naming solution in Bambi so that we don't need to use aliasing at all?

I am seeing a bit of an inconsistencies here: the covariates on the regression of the parent are directly named, and the parameters on the regression on auxiliary parameters are named like f"{parameter}_{covariate}". If there is a way to get the latter naming convention with the parent parameter in bambi, aliasing would not be necessary. I think bambi is already moving towards the direction of not treating the parent radically differently in distributional models, so I wonder if we can go a bit further?

tomicapretto commented 1 month ago

If there is a way to get the latter naming convention with the parent parameter in bambi, aliasing would not be necessary.

I can imagine two scenarios where this can happen

  1. There's a configuration variable which allows you to toggle between what you suggest and the existing behavior.
  2. We implement what you suggest, but only when non-parent parameters are modeled with covariates. Making the suggested behavior the default for all cases will cause a lot of confusion to most users who are used to see the name of the term match the name of the variables in the inference data object (i.e. users are used to find x in idata when they do y ~ x + z).

I have a preference for 1 right now.

digicosmos86 commented 1 month ago

If there is a way to get the latter naming convention with the parent parameter in bambi, aliasing would not be necessary.

I can imagine two scenarios where this can happen

  1. There's a configuration variable which allows you to toggle between what you suggest and the existing behavior.
  2. We implement what you suggest, but only when non-parent parameters are modeled with covariates. Making the suggested behavior the default for all cases will cause a lot of confusion to most users who are used to see the name of the term match the name of the variables in the inference data object (i.e. users are used to find x in idata when they do y ~ x + z).

I have a preference for 1 right now.

I also have a strong preference for 1. It would be really helpful for us if this is possible in Bambi