easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
380 stars 38 forks source link

warnings/approximation of observation-level variance in .get_variance_distributional #877

Closed bbolker closed 2 weeks ago

bbolker commented 1 month ago

See https://stackoverflow.com/a/78557174/190277

There's a warning thrown when computing (ultimately) performance::r2_nakagawa() if exp(null.fixef) is less than 6, which has something to do with the validity of the log-Normal approximation from Nakagawa et al. 2017. This test supposedly comes from me, but I think I stole it from somewhere else (see linked question for details). In the meantime, MuMIn has much improved code for these approximations now (including exact, delta, log-normal, and trigamma approximations), which should possibly be used instead of what's there now ...

strengejacke commented 3 weeks ago

Ben, I added a test (see file _test-r2_nakagawaMuMIn.R in #883) to check results from get_variance()(resp. r2_nakagawa()) of all distr./families against MuMIn::r.squaredGLMM().

I will then revise the computation in get_variance() for models where I get spurious/incorrect results.

One difference I found is the computation of fixed effects variance.

insight uses:

var(as.vector(fixef(m) %*% t(getME(m, "X")))

MuMIn uses:

var(fitted(m))

The are often very similar, but sometime (e.g., Poisson mixed model with random slope) can differ slightly. What would you say is the more accurate approach to extract fixed effects variances?

strengejacke commented 2 weeks ago

What is not very well tested are zero-inflated and hurdle models from glmmTMB. Not sure, though, against which functions/values test results can be validated?

bbolker commented 2 weeks ago

I think I need more context; what is the "fixed effects variance" supposed to mean/how is it used? Are we talking about the variance on the link/linear predictor scale or on the data/response scale? fitted() generally gives values on the data scale and includes all model components, the insight calculation is done on the link scale and includes only the conditional model (in cases where there are RE and/or zero-inflation)

strengejacke commented 2 weeks ago

fixed effects variance: See Eq. 2.4 https://royalsocietypublishing.org/doi/10.1098/rsif.2017.0213

bbolker commented 2 weeks ago

OK. As far as I can tell quickly, it does seem that X %*% beta is definitely better than fitted() [I'm having a hard time figuring out when fitted() would actually be appropriate ...]. It's used in this supp material. For hurdle models, I think the interesting questions would arise in calculating the distributional variance, not the fixed effect variance. For Z-I models, I think you'd have to think hard about how the component of variance due to zero-inflation fitted into the overall framework ... (with a trivial, intercept-only Z-I model, you could incorporate the Z-I effect into the distributional variance; with a non-trivial Z-I model, you might have to derive some new results ...) https://stats.stackexchange.com/questions/549959/r-squared-for-a-zero-truncated-negative-binomial-model

strengejacke commented 2 weeks ago

I think I need some statistical advice, or how modelling packages behave. I'm referring to Suppl. 2 of Nakagawa:

https://royalsocietypublishing.org/action/downloadSupplement?doi=10.1098%2Frsif.2017.0213&file=rsif20170213supp2.pdf

On page 5, "Quasi-Poisson" models are described, however, package glmmadmb is used with family = "nbinom1" (though the family() says it's quasi-poisson):

image

Negative binomial models start on page 9, where "nbimom2" is used:

image

The difference is the log-approximation. For quasi-poisson, it's log(1 + omegaN / lambda), for neg.-binomial it is log(1 + (1 / lambda) + (1 / thetaN)). If I change the the code in insight::get_variance() accordingly, I get identical results for a glmmTMB(nbinom2) model for both coding R2 by hand using Nakagawa's et als code, or using performance::r2_nakagawa().

I think I would now use:

log(1 + omegaN / lambda) # quasi-poisson and poisson
log(1 + (1 / lambda) + (1 / thetaN)) # neg-binomial

# where lambda is "mu" in insight
# and omegaN is "vv", while thetaN is "sig"

What do you think?

The affected code starts here: https://github.com/easystats/insight/blob/fa1ac637cdc3a078bbd1bc05681b585b67f22709/R/compute_variances.R#L656

And the Nakagawa-code to compute log-approximation (log1p(1+var/mu^2)) is here (where the result from switch() is used in log1p()): https://github.com/easystats/insight/blob/fa1ac637cdc3a078bbd1bc05681b585b67f22709/R/compute_variances.R#L764-L773

strengejacke commented 2 weeks ago

Ok, here's a comparison of the Nakagawa code, performance and MuMIn. There a minimal differences between Nakagawa and performance, due to slightly different residual variance - I debugged step by step and compared values, I'm not sure how this occurs. I'll try to figure out.

The main take-away is: the suggested code by Nakagawa seems not to be correctly implemented in MuMIn for negative-binomial. It's probably due to the confusion of the example I've shown above: glmmadmb uses family = "nbinom1", but fits a quasi-binomial model (according to family()). I would change insight::get_variance() to be in line with Nakagawa, although will then slightly differ from MuMIn. But my impression is this is the "correct" way, or not?

library(performance)
library(insight)
library(glmmTMB)

# Models
glmmTMBr <- glmmTMB::glmmTMB(
  count ~ (1 | site),
  family = glmmTMB::nbinom1(),
  data = Salamanders, REML = TRUE
)
glmmTMBf <- glmmTMB::glmmTMB(
  count ~ mined + spp + (1 | site),
  family = glmmTMB::nbinom1(),
  data = Salamanders, REML = TRUE
)

Code by hand from Nakagawa suppl.

# Calculation of the variance in fitted values
VarF <- var(as.vector(get_modelmatrix(glmmTMBf) %*% fixef(glmmTMBf)$cond))
# this is "mu" in insight
lambda <- as.numeric(exp(fixef(glmmTMBr)$cond + 0.5 * (as.numeric(VarCorr(glmmTMBr)$cond[1]))))
# this is "sig" in insight
thetaF <- sigma(glmmTMBf) # note that theta is called alpha in glmmadmb
# this is what ".variance_distributional()" returns
VarOdF <- 1 / lambda + 1 / thetaF # the delta method
VarOlF <- log(1 + (1 / lambda) + (1 / thetaF)) # log-normal approximation
VarOtF <- trigamma((1 / lambda + 1 / thetaF)^(-1)) # trigamma function
# R2[GLMM(m)] - marginal R2[GLMM]
R2glmmM <- VarF / (VarF + sum(as.numeric(VarCorr(glmmTMBf)$cond)) + VarOlF)
# R2[GLMM(c)] - conditional R2[GLMM] for full model
R2glmmC <- (VarF + sum(as.numeric(VarCorr(glmmTMBf)$cond))) / (VarF + sum(as.numeric(VarCorr(glmmTMBf)$cond)) + VarOlF)

Comparison lognormal

# R2 based on Suppl. of Nakagawa et al. 2017, lognormal
c(R2glmmM = R2glmmM, R2glmmC = R2glmmC)
#>   R2glmmM   R2glmmC 
#> 0.5860120 0.6931856

# current implementation - not sure *where* this small deviation comes from
# maybe it's the NULL model?
performance::r2_nakagawa(glmmTMBf, null_model = glmmTMBr)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.717
#>      Marginal R2: 0.606

# What MuMIn calculates
suppressWarnings(MuMIn::r.squaredGLMM(glmmTMBf)[2, ])
#>       R2m       R2c 
#> 0.5172092 0.6117996

Comparison delta

R2glmmM <- VarF / (VarF + sum(as.numeric(VarCorr(glmmTMBf)$cond)) + VarOdF)
R2glmmC <- (VarF + sum(as.numeric(VarCorr(glmmTMBf)$cond))) / (VarF + sum(as.numeric(VarCorr(glmmTMBf)$cond)) + VarOdF)
# R2 based on Suppl. of Nakagawa et al. 2017, delta
c(R2glmmM = R2glmmM, R2glmmC = R2glmmC)
#>   R2glmmM   R2glmmC 
#> 0.5119224 0.6055460

performance::r2_nakagawa(glmmTMBf, null_model = glmmTMBr, approximation = "delta")
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.642
#>      Marginal R2: 0.543

# What MuMIn calculates
suppressWarnings(MuMIn::r.squaredGLMM(glmmTMBf)[1, ])
#>       R2m       R2c 
#> 0.3989273 0.4718856

Reason for minor deviations of performance from Nakagawa

# The small deviation between performance and Nakagawa seem to be residual variance,
# though I'm not sure where it differes
vars <- get_variance(glmmTMBf, null_model = glmmTMBr)
all.equal(vars$var.fixed, VarF)
#> [1] TRUE

all.equal(sum(vars$var.intercept), sum(as.numeric(VarCorr(glmmTMBf)$cond)))
#> [1] TRUE

# minor difference - need to check
all.equal(vars$var.residual, VarOlF)
#> [1] "Mean relative difference: 0.1186565"

Created on 2024-06-13 with reprex v2.1.0

This is the code to replicate MuMIn:

 switch(faminfo$family, 
   nbinom2 = , 
   `negative binomial` = (1 / mu) + (1 / sig), 
   nbinom = , 
   nbinom1 = , 
   poisson = , 
   quasipoisson = vv / mu, 
   vv / mu^2 
 ) 

This is the code to replicate Nakagawa. Note that nbinom1 moved up. glmmadmb with family = "nbinom1" is listed as "Quasi-Poisson" in the Supplement, and also family(model) returns that value.

 switch(faminfo$family, 
   nbinom1 = , 
   nbinom2 = , 
   `negative binomial` = (1 / mu) + (1 / sig), 
   nbinom = , 
   poisson = , 
   quasipoisson = vv / mu, 
   vv / mu^2 
 ) 
strengejacke commented 2 weeks ago

Ok, I found the difference. It's in the internal function .get_sigma.glmmTMB():

.get_sigma.glmmTMB <- function(x, ...) {
  if (stats::family(x)$family == "nbinom1") {
    add_value <- 1
  } else {
    add_value <- 0
  }
  stats::sigma(x) + add_value
}

I did this to be in line with and get the same results as MuMIn. However, when I use

.get_sigma.glmmTMB <- function(x, ...) {
  stats::sigma(x)
}

I get identical results to Nakagawa.

strengejacke commented 2 weeks ago

@easystats/core-team FYI, I have largely revised get_variance() and added lots of tests, which validate results against the Nakagawa paper, and against MuMIn when examples in the Nakagawa paper were missing. See follow-up issue #889