Open bwiernik opened 4 years ago
!! thanks. This is not surprising (I will try to write more here) but important.
Started looking. Basic cause I think is whether sigma
is considered a ran_par
or not ...
I mean, I definitely would regard it as one haha. It bugs me a bit that lm()
simga
doesn't get a confidence interval by default. What's the difference the handling between glmmTMB and lme4?
Here are some of the issues.
lm()
and lmer
, sigma
is a parameter that's profiled out - that is, it's never explicitly estimated. (Easiest to understand in the context of regular linear regression, where sigma is computed as the square root of the residual variance.) Therefore, it's rarely considered a parameter of the model. (Although this argument breaks down slightly for lmer
, where the fixed-effect parameters are also profiled out of the model - their values and confidence intervals are computed after fitted.) The residual SD is probably also frequently ignored because it has a different sampling distribution from the fixed-effect parameters (sqrt-chi-squared).glmmTMB
, sigma is estimated as part of the dispersion model, rather than part of the random effects model. That is, the other ran_pars
(sd/correlation/whatever) are estimated as part of the conditional or zero-inflated model; the dispersion model is a separate sub-model.All that said, I think this is all fairly easily fixable with a bit more investigation and careful coding.
I now hit this very wall myself. As far as I can tell both CIs for the random effect and residual variance are correctly calculated but get mixed-up when assembling the result in this part https://github.com/bbolker/broom.mixed/blob/b076753d550caa6fd159ca4d68ffb0410d0bff3b/R/glmmTMB_tidiers.R#L116-L119
library(broom.mixed)
library(glmmTMB)
data("sleepstudy", package = "lme4")
fit <- glmmTMB(Reaction ~ Days + (1 | Subject), sleepstudy)
CI for the random effect wrong
tidy(fit, conf.int = TRUE)
#> # A tibble: 4 × 10
#> effect component group term estimate std.error statistic p.value conf.low
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 fixed cond <NA> (Int… 251. 9.51 26.4 4.01e-154 233.
#> 2 fixed cond <NA> Days 10.5 0.802 13.1 5.89e- 39 8.90
#> 3 ran_pa… cond Subj… sd__… 36.0 NA NA NA 27.7
#> 4 ran_pa… cond Resi… sd__… 30.9 NA NA NA NA
#> # ℹ 1 more variable: conf.high <dbl>
Inspecting the transient content of res
before and after the offending if
:
# # Identifying relevant step
#
# b <- broom.mixed:::tidy.glmmTMB |>
# body()
# as.list(b[[2]][[3]][[3]]) # 6: computed CI before "filtering"
trace(
"tidy.glmmTMB",
bquote({
print(res)
cat("------------\n")
}),
at = list(
c(2,3,3,5), # Before if()
c(2,3,3,6) # After if()
),
where = asNamespace("broom.mixed")
)
#> Tracing function "tidy.glmmTMB" in package "namespace:broom.mixed"
#> [1] "tidy.glmmTMB"
broom.mixed:::tidy.glmmTMB(fit, conf.int = TRUE) |>
as.data.frame()
#> Tracing safe_confint(x, method = tolower(conf.method), level = conf.level, .... step 2,3,3,5
#> 2.5 % 97.5 %
#> (Intercept) 232.773317 270.03687
#> Days 8.895915 12.03866
#> ------------
#> Tracing safe_confint(x, method = tolower(conf.method), level = conf.level, .... step 2,3,3,6
#> 2.5 % 97.5 %
#> (Intercept) 232.773317 270.03687
#> Days 8.895915 12.03866
#> ------------
#> Tracing safe_confint(x, parm = thpar, method = conf.method, level = conf.level, .... step 2,3,3,5
#> 2.5 % 97.5 %
#> Std.Dev.(Intercept)|Subject 25.35711 51.14422
#> ------------
#> Tracing safe_confint(x, parm = thpar, method = conf.method, level = conf.level, .... step 2,3,3,6
#> 2.5 % 97.5 %
#> ------------
#> Tracing safe_confint(x, parm = "sigma", method = conf.method, level = conf.level, .... step 2,3,3,5
#> 2.5 % 97.5 %
#> sigma 27.70801 34.44953
#> ------------
#> Tracing safe_confint(x, parm = "sigma", method = conf.method, level = conf.level, .... step 2,3,3,6
#> 2.5 % 97.5 %
#> sigma 27.70801 34.44953
#> ------------
#> effect component group term estimate std.error statistic
#> 1 fixed cond <NA> (Intercept) 251.40509 9.5061837 26.44648
#> 2 fixed cond <NA> Days 10.46729 0.8017355 13.05579
#> 3 ran_pars cond Subject sd__(Intercept) 36.01207 NA NA
#> 4 ran_pars cond Residual sd__Observation 30.89544 NA NA
#> p.value conf.low conf.high
#> 1 4.005314e-154 232.773317 270.03687
#> 2 5.889859e-39 8.895915 12.03866
#> 3 NA 27.708012 34.44953
#> 4 NA NA NA
The tracing above prints res
for each call to safe_confint()
twice –
before and after the offending if
:
As (2) and (3) are rbind()-ed, the CI for residual SD ends up in the 3rd row (for the random effect) rather than in the 4th row for residual SD.
# Cleanup
untrace(
"tidy.glmmTMB",
where = asNamespace("broom.mixed")
)
#> Untracing function "tidy.glmmTMB" in package "namespace:broom.mixed"
Created on 2024-05-13 with reprex v2.1.0
I am finding that the tidy.glmmTMB() function doesn't report correct CIs (either Wald or profile) for random effects parameters (including sigma), whereas the tidy.lmerMod() function does. Here is an example. It shows the same bounds for both the intercept SD and sigma, and the values are clearly wrong for both (the point estimate falls way outside the interval).