gamlj / gamlj

GAMLj: GLM, Mixed, Generalized and Generalized mixed models for jamovi
39 stars 11 forks source link

Understanding differences between gamlj::gamljGlmMixed() and lme4::glmer() #181

Open simoncolumbus opened 1 year ago

simoncolumbus commented 1 year ago

I have fitted some equivalent models with gamlj::gamljGlmMixed() and with lme4::glmer(). I was under the assumption that gamljGlmMixed() was a wrapper for glmer() and would produce equivalent results, but this is not the case. I am trying to understand what produces these differences. Is it possible to see the exact glmer() input gamljGlmMixed() produces?

Minimum working example with some model configurations below.

set.seed(1234)

# Generate data
data <- tibble(grp = rep(1:100, each = 20),
               y = sample(0:1, 2000, replace = T),
               x1 = rep(0:1, times = 1000),
               x2 = rep(0:1, each = 20, times = 50),
               x3 = rnorm(2000, mean = 0, sd = 1))

# Intercept-only model; 
# Estimates are the same with both packages
gamlj::gamljGlmMixed(y ~ 1 + (1|grp),
                     data = data,
                     modelSelection = "custom",
                     custom_family = "binomial",
                     custom_link = "logit")

lme4::glmer(y ~ 1 + (1|grp),
            data = data,
            family = binomial(link = 'logit'))

# Model with a single, dichotomous predictor varying within groups;
# Estimates for x1 are equivalent, but fixed intercept and random intercept
# variance differ
gamlj::gamljGlmMixed(y ~ 1 + x1 + (1|grp),
                     data = data,
                     modelSelection = "custom",
                     custom_family = "binomial",
                     custom_link = "logit")

lme4::glmer(y ~ 1 + x1 + (1|grp),
            data = data,
            family = binomial(link = 'logit'))

# Model with a single, dichotomous predictor varying between groups;
# Estimates for x2 are equivalent, but fixed intercepts differ
gamlj::gamljGlmMixed(y ~ 1 + x2 + (1|grp),
                     data = data,
                     modelSelection = "custom",
                     custom_family = "binomial",
                     custom_link = "logit")

lme4::glmer(y ~ 1 + x2 + (1|grp),
            data = data,
            family = binomial(link = 'logit'))

# Model with an interaction between two dichotomous predictors, one varying
# within and the other between groups; the estimate for the interaction is
# equivalent, but the fixed intercept and the two main effects of x1 and x2
# differ
gamlj::gamljGlmMixed(y ~ 1 + x1*x2 + (1|grp),
                     data = data,
                     modelSelection = "custom",
                     custom_family = "binomial",
                     custom_link = "logit")

lme4::glmer(y ~ 1 + x1*x2 + (1|grp),
            data = data,
            family = binomial(link = 'logit'))

# Model with a single continuous predictor varying within groups; the estimate
# for x3 is equivalent, but the fixed intercepts vary (albeit very slightly)
gamlj::gamljGlmMixed(y ~ 1 + x3 + (1|grp),
                     data = data,
                     modelSelection = "custom",
                     custom_family = "binomial",
                     custom_link = "logit")

lme4::glmer(y ~ 1 + x3 + (1|grp),
            data = data,
            family = binomial(link = 'logit'))
mcfanda commented 1 year ago

Hi We should consider that estimating a linear model takes a little more than running a command. GAMLj does use lme4::glmer() to estimate the coefficients of the generalized mixed model, but it prepares the data to yield a sensible model. The differences you observed are due to the fact that in R the user is responsible to prepare the data, whereas in GAMLj the data are prepared accordingly to some default (https://gamlj.github.io/)

  1. In the example, the clustering variable is not a factor. lme4 expects the clustering variable to be a factor, but it estimates the model anyway. GAMLj coerces the clustering variable into a factor by default.
  2. In GAMLj all independent variables are centered to their means by default, in R they are not. The reason is that in linear models without interactions or random coefficients, centering does not alter the results. In models with interactions, the linear effects are the main effects (like in the classical ANOVA) and the mixed models are more likely to converge.

Notice, for instance, that the ANOVA-like model you ran with y ~ 1 + x1*x2 + (1|grp) will not yield the classical main effects and interaction, but would yield two simple effects and an interaction. To obtain main effects, you should code x1 and x2 as factors, and center their codes.

To obtain the same results in GAMLj and lme4::glmer() one should simply be sure that the variables are defined in the same way before each command is executed. Here is an example with your data

data$grp<-factor(data$grp)
data$x1<-factor(data$x1)
contrasts(data$x1)<- -contr.sum(2)/2
data$x2<-factor(data$x2)
contrasts(data$x2)<- -contr.sum(2)/2

gamlj::gamljGlmMixed(y ~ 1 + x1*x2 + (1|grp),
                     data = data,
                     modelSelection = "custom",
                     custom_family = "binomial",
                     custom_link = "logit")

lme4::glmer(y ~ 1 + x1*x2 + (1|grp),
            data = data,
            family = binomial(link = 'logit'))
simoncolumbus commented 1 year ago

Thanks a lot for this explanation, Marcello. That was very helpful!