bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.08k stars 124 forks source link

y ~ 1 + x:z outputs estimates for Intercept, x:z AND z #747

Closed hummuscience closed 1 year ago

hummuscience commented 1 year ago

I explained this in the following issue: https://github.com/lnccbrown/HSSM/issues/308

Bambi model written with the following formula: y ~ 1 + x:z should estimate an intercept and x:z. Instead, the model also outputs an estimate for z (or whatever is placed second in the interaction).

Running similar code in Formulae outputs the correct model.

Or am I missing something here?

import bambi as bmb

# Read data
csv_url = 'https://raw.githubusercontent.com/haripen/Bambi_Analysis/main/sim_xover.csv'
df = pd.read_csv(csv_url,index_col=0,header=0)

bmb.Model('value ~ treatment:sequence', data = df)
       Formula: value ~ treatment:sequence
        Family: gaussian
          Link: mu = identity
  Observations: 240
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 2.575700044631958, sigma: 88.95659637451172)
            sequence ~ Normal(mu: 0.0, sigma: 101.8489990234375)
            treatment:sequence ~ Normal(mu: [0. 0.], sigma: [124.739  108.8811])

        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 19.958200454711914)
import formulae as fm

fm.model_description('y ~ 1 + a:b')
Model(
  Response(Term([Variable(y)])),
  Intercept(),
  Term([Variable(a), Variable(b)])
)
fm.model_description('y ~ 0 + a:b')
Model(
  Response(Term([Variable(y)])),
  Term([Variable(a), Variable(b)])
)
fm.model_description('y ~ 1 + a*b')
Model(
  Response(Term([Variable(y)])),
  Intercept(),
  Term([Variable(a)]),
  Term([Variable(b)]),
  Term([Variable(a), Variable(b)])
)
tomicapretto commented 1 year ago

Hi @Cumol! thanks for sharing the problem with such amount of detail. This is a confusing point, but I think Bambi and formulae are not doing the wrong thing.

Shortly, if you have an intercept plus AN interaction between two categorical variables, you need to add one of the main effects because the constraints that are applied to the interaction prevent it from spanning the space of the interaction. This is much better explained here in Patsy's docs https://patsy.readthedocs.io/en/latest/formulas.html#redundancy-and-categorical-factors. Formulae does the same thing. Check the block where it does dmatrices("y ~ 1 + a:b", data).

If you don't want to have the main effect, which may make interpretations more complicated, then use value ~ 0 + treatment:sequence.