Different R2 for `lme4::glmer.nb()` models depending on offset term formulation

maxsitt commented 2 years ago

I'm posting this here, but the main problem might be in the insight package. But I got aware of this via the R2 output of performance. The glmer.nb() models are probably nonsense with the mtcars data, I just wanted to reproduce the different R2 output (my data is fitted with negative binomial distribution).

Additional question: What is, from your point of view, the more "correct" or more often used offset term formulation? I have seen + offset() more often, but not sure if this is really the "better" formulation.

library(lme4)
library(performance)
library(insight)

m1 <- lmer(log(mpg) ~ disp + (1|cyl) + offset(log(wt)), data = mtcars)
r2(m1)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.817
#>      Marginal R2: 0.757
find_offset(m1)
#> [1] "wt"

m2 <- lmer(log(mpg) ~ disp + (1|cyl), offset = log(wt), data = mtcars)
r2(m2) # same R2 values
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.817
#>      Marginal R2: 0.757
find_offset(m2) # offset term cannot be found by insight
#> NULL

m3 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl) + offset(log(wt)), data = mtcars))
r2(m3)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.815
#>      Marginal R2: 0.799
find_offset(m3)
#> [1] "wt"

m4 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl), offset = log(wt), data = mtcars))
r2(m4) # different R2 values (because of missing offset term?)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.655
#>      Marginal R2: 0.642
find_offset(m4) # offset term cannot be found by insight
#> NULL

^{Created on 2022-04-01 by the reprex package (v2.0.1)}

strengejacke commented 2 years ago

Despite the minor differences between performance::r2() and MuMIn::r.squaredGLMM(), both packages report different R2 depending on whether offset is included in the formula or not.

library(lme4)
#> Loading required package: Matrix
library(performance)
library(MuMIn)

m1 <- lmer(log(mpg) ~ disp + (1|cyl) + offset(log(wt)), data = mtcars)
r2(m1)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.817
#>      Marginal R2: 0.757
r.squaredGLMM(m1)
#> Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
#>            R2m       R2c
#> [1,] 0.7573362 0.8171651

m2 <- lmer(log(mpg) ~ disp + (1|cyl), offset = log(wt), data = mtcars)
r2(m2) # same R2 values
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.817
#>      Marginal R2: 0.757
r.squaredGLMM(m2)
#>            R2m       R2c
#> [1,] 0.7573362 0.8171651

m3 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl) + offset(log(wt)), data = mtcars))
r2(m3)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.815
#>      Marginal R2: 0.799
r.squaredGLMM(m3)
#> Warning: the null model is correct only if all variables used by the original
#> model remain unchanged.
#>                 R2m       R2c
#> delta     0.7960440 0.8127380
#> lognormal 0.8010033 0.8178012
#> trigamma  0.7908055 0.8073896

m4 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl), offset = log(wt), data = mtcars))
r2(m4) # different R2 values (because of missing offset term?)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.655
#>      Marginal R2: 0.642
r.squaredGLMM(m4)
#> Warning: the null model is correct only if all variables used by the original
#> model remain unchanged.
#>                 R2m       R2c
#> delta     0.6448313 0.6583541
#> lognormal 0.6607767 0.6746339
#> trigamma  0.6272789 0.6404336

The main difference I found, which might be the reason for the different R2 in both packages, is how null-models are calculated:

library(lme4)
#> Loading required package: Matrix

m3 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl) + offset(log(wt)), data = mtcars))
m4 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl), offset = log(wt), data = mtcars))

insight::null_model(m3)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: Negative Binomial(49.3607)  ( log )
#> Formula: mpg ~ (1 | cyl)
#>    Data: mtcars
#>      AIC      BIC   logLik deviance df.resid 
#> 189.7125 194.1097 -91.8563 183.7125       29 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  cyl    (Intercept) 0.223   
#> Number of obs: 32, groups:  cyl, 3
#> Fixed Effects:
#> (Intercept)  
#>       2.993
insight::null_model(m4)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: Negative Binomial(49.3607)  ( log )
#> Formula: mpg ~ (1 | cyl)
#>    Data: mtcars
#>  Offset: log(wt)
#>       AIC       BIC    logLik  deviance  df.resid 
#>  227.6811  232.0783 -110.8405  221.6811        29 
#> Random effects:
#>  Groups Name        Std.Dev.
#>  cyl    (Intercept) 0.4643  
#> Number of obs: 32, groups:  cyl, 3
#> Fixed Effects:
#> (Intercept)  
#>       1.891

strengejacke commented 2 years ago

closes in https://github.com/easystats/insight/commit/c6e75101addd220638ba73f50fe65d50dc76bb6c

library(lme4)
#> Loading required package: Matrix
library(performance)
library(insight)

m1 <- lmer(log(mpg) ~ disp + (1|cyl) + offset(log(wt)), data = mtcars)
r2(m1)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.817
#>      Marginal R2: 0.757
find_offset(m1)
#> [1] "wt"

m2 <- lmer(log(mpg) ~ disp + (1|cyl), offset = log(wt), data = mtcars)
r2(m2) # same R2 values
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.817
#>      Marginal R2: 0.757
find_offset(m2)
#> [1] "wt"

m3 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl) + offset(log(wt)), data = mtcars))
r2(m3)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.655
#>      Marginal R2: 0.642
find_offset(m3)
#> [1] "wt"

m4 <- suppressWarnings(glmer.nb(mpg ~ disp + (1|cyl), offset = log(wt), data = mtcars))
r2(m4) # different R2 values (because of missing offset term?)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.655
#>      Marginal R2: 0.642
find_offset(m4)
#> [1] "wt"

^{Created on 2022-05-06 by the reprex package (v2.0.1)}

easystats / performance

Different R2 for `lme4::glmer.nb()` models depending on offset term formulation #410