Open Istalan opened 1 year ago
Tagging @bbolker as well...
This is an interesting question. To be honest I haven't seen anyone discuss/derive this explicitly for GLMMs; as with so much of the rest of the GLMM literature, it's "this works for GLMs, I think the extension to GLMMs is correct". The simulation results are reasonably convincing, but: I would guess that the effective "residual degrees of freedom" computation is probably going to depend on the amount of shrinkage/size of the RE variances relative to the Poisson variance? (At the very least the GLMM FAQ needs a cautionary note: @Istalan, posting an issue there (replicate or point to this issue) would be great ...)
Use the new machinery from #643, the dispersion ratio looks good when we use simulated residuals (based on DHARMa):
library(performance)
library(lme4)
#> Loading required package: Matrix
set.seed(101)
d <- data.frame(
x = runif(1000),
f = factor(sample(1:200, size = 1000, replace = TRUE))
) # modified for more random effects
suppressMessages(d$y <- simulate(~ x + (1 | f),
family = poisson,
newdata = d,
newparams = list(theta = 1, beta = c(0, 2))
)[[1]])
m1 <- glmer(y ~ x + (1 | f), data = d, family = poisson)
check_overdispersion(m1)
#> # Overdispersion test
#>
#> dispersion ratio = 0.817
#> Pearson's Chi-Squared = 814.179
#> p-value = 1
#> No overdispersion detected.
check_overdispersion(simulate_residuals(m1))
#> # Overdispersion test
#>
#> dispersion ratio = 0.953
#> p-value = 0.544
#> No overdispersion detected.
Created on 2024-03-16 with reprex v2.1.0
This raises the question whether we should use that approach in general for mixed models? @bbolker
Hi,
so I was playing around with mixed Poisson models and noticed that for large numbers of random coefficients (also called random effects, i guess) the dispersion ratio from
check_overdispersion
was too low.I used code from the Ben Bolker GLMM-FAQ (https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#overdispersion) and modified it for 200 random coefficients:
Now this should be a perfect model, but the dispersion ratio is only about 0.8 not 1. Looking deeper the assumed residual degrees of freedom are 997, which is widely inconsitent with the result of 814
Now I assumed, that the issue is that overdispersion is a phenomenon on the conditional residuals, as the random coefficients have a scaling factor. So they act as fixed coefficients in this analysis and should be subtracted from the degrees of freedom which gives 200 RC the intercept and x makes 798 df. Checking with 200 simulations you have to modify Ben Bolkers code to avoid the issue with lambda < 5, but you get results that look pretty consistent
Now i wanted to make sure i got it right and used a much larger simulation and here the chisq seem to have a mean closer to 799 which confuses me, but is perhaps explained by remaining imprecision of the pearson residual:
The same problem exists in the
overdisp_fun
from the GLMM-FAQ. Do you think I should post there as well?Session info: