Validate `get_variance()` against remaining families

strengejacke commented 5 months ago

As a follow-up of #877

Following families need validation or don't yet work:

[ ] betabinomial
[ ] hurdle models
[ ] zero-inflated models
[ ] glmmTMB::compois
[ ] glmmTMB::genpois
[ ] glmmTMB::lognormal - these models have low fixed effects variance, leading to R2 close to 0. Calculation needs revision here?

Tagging @bbolker FYI (also tagging @bwiernik )

1) How could we possibly validate genpois and compois families? Are there any related families that would return a similar R2 so we have a reference for validating the code?

2) How to calculate the distribution-specific variance for zero-inflated models? (or at least: what's more accurate, see https://github.com/easystats/insight/pull/893#issuecomment-2175532919

3) Is it expected that the R2 is lower for zero-inflated models, when the full model (zero-inflation and conditional part) is taken into account (as opposed to the conditional component of zero-inflated models only, see following example)?

Regarding Zero-Inflation models

Formerly the dispersion parameter for Poisson and ZI Poisson was set to 1. Now, the behaviour for ZI Poisson only has changed, returning a different dispersion / variance:

# For zero-inflated poisson models, the
# distributional variance is based on Zuur et al. 2012
# ----------------------------------------------
.variance_zip <- function(model, faminfo, family_var) {
  if (inherits(model, "glmmTMB")) {
    p <- stats::predict(model, type = "zprob")
    mu <- stats::predict(model, type = "conditional")
    pvar <- (1 - p) * (mu + p * mu^2)
  } else if (inherits(model, "MixMod")) {
    p <- stats::plogis(stats::predict(model, type_pred = "link", type = "zero_part"))
    mu <- suppressWarnings(stats::predict(model, type = "mean_subject"))
    pvar <- (1 - p) * (mu + p * mu^2)
  } else {
    pvar <- family_var
  }

  mean(pvar)
}

Taking following model, for this particular example, this comes closer to a Bayesian model than setting sigma/dispersion to 1 (for zero-inflation models!)

m <- glmmTMB::glmmTMB(count ~ mined + cover + (1 + cover | site),
  ziformula = ~mined,
  family = poisson(), data = Salamanders
)
performance::r2_nakagawa(m)

Formerly with sigma/dispersion = 1

# R2 for Mixed Models

  Conditional R2: 0.650
     Marginal R2: 0.525

Now

# R2 for Mixed Models

  Conditional R2: 0.414
     Marginal R2: 0.334

brms returns (marginal R2 only)

library(brms)
m2 <- brms::brm(bf(count ~ mined + (1 | site), zi ~mined),
  family = brms::zero_inflated_poisson(), data = Salamanders, backend = "rstan"
)
brms::bayes_R2(m2)
    Estimate  Est.Error      Q2.5     Q97.5
R2 0.1686378 0.01874732 0.1334217 0.2070608

Any ideas how to validate the results? @bbolker @bwiernik ?

strengejacke commented 5 months ago

We have following deviation to MuMIn:

Computation of fixed effects variance: https://github.com/easystats/insight/issues/877#issuecomment-2163052765
Computation of distribution-specific variance for negative binomial models (negbin/nbinom1): https://github.com/easystats/insight/issues/877#issuecomment-2165551707
Computation of Sigma for glmmTMB: https://github.com/easystats/insight/issues/877#issuecomment-2165696977

However, the things we do differently are in line with the code in the Supplement 2 of Nakagawa. Thus, I would say we go with the current implementation of get_variance(). I added lots of tests to validate against external evidence of correct results.

strengejacke commented 5 months ago

@bbolker null_model() probably doesn't calculate the correct null model for zero-inflation models. What would be the correct null model in the next example?

data(fish, package = "insight")
m1 <- glmmTMB::glmmTMB(
  count ~ child + camper + (1 | persons),
  ziformula = ~ child + camper + (1 | persons),
  data = fish,
  family = poisson()
)

summary(insight::null_model(m1))
#>  Family: poisson  ( log )
#> Formula:          count ~ (1 | persons)
#> Zero inflation:         ~child + camper + (1 | persons)
#> Data: fish
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   1808.7   1829.8   -898.3   1796.7      244 
#> 
#> Random effects:
#> 
#> Conditional model:
#>  Groups  Name        Variance Std.Dev.
#>  persons (Intercept) 0.6808   0.8251  
#> Number of obs: 250, groups:  persons, 4
#> 
#> Zero-inflation model:
#>  Groups  Name        Variance Std.Dev.
#>  persons (Intercept) 1.069    1.034   
#> Number of obs: 250, groups:  persons, 4
#> 
#> Conditional model:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   1.5875     0.4173   3.804 0.000142 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Zero-inflation model:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -0.4988     0.5949  -0.839  0.40172    
#> child         2.0850     0.3212   6.491 8.52e-11 ***
#> camper1      -1.0863     0.3447  -3.151  0.00163 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

^{Created on 2024-06-14 with reprex v2.1.0}

bbolker commented 5 months ago

Presumably it would be

formula = count ~ 1 + (1 | persons),
 ziformula =    ~1 + (1 | persons)

?

easystats / insight

Validate `get_variance()` against remaining families #889

Regarding Zero-Inflation models