easystats / modelbased

:chart_with_upwards_trend: Estimate effects, contrasts and means based on statistical models
https://easystats.github.io/modelbased/
GNU General Public License v3.0
234 stars 19 forks source link

estimate_grouplevel confidence intervals #134

Open bwiernik opened 3 years ago

bwiernik commented 3 years ago

Currently, glmmTMB does not report uncertainty intervals for estimated group-level coefficients (result of coef()), but the information needed is available in the model (see https://github.com/glmmTMB/glmmTMB/issues/691).

Until these can be incorporated into glmmTMB itself, it might be nice to manually implement them ourselves. Here is an example:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(glmmTMB)
set.seed(20160904)

dw <- merge(
  psych::scoreVeryFast(psychTools::bfi.keys, psychTools::bfi),
  subset(psychTools::bfi, select = "age"),
  by = "row.names",
  sort = FALSE
) |>
  na.omit() |>
  mutate(across(-Row.names, \(x) scale(x)[,1])) |>
  mutate(stability = -1 * neuroticism) |>
  select(-neuroticism) |>
  rename(id = Row.names) |>
  subset(id %in% sample(id, 300))

dl <- dw |>
    tidyr::pivot_longer(
      cols = -c("id", "age"),
      names_to = "scale",
      values_to = "score"
    )

# partial pooling
m1 <- glmmTMB(score ~ 1 + age + (age | scale) + (1 | id),
                         dispformula = ~ 0 + scale,
                         data = dl)

# Need covariance between fixed effects and random effects estimates
# lme4 and glmmTMB don't currently provide an easy way to access their covariance
# (for intervals, easiest approach is probably to fit a Bayesian model using brms)

## this is messy--need to manually match elements of condVar matrix with contrast matrix
selector <- c(seq_len(length(fixef(m1)$cond)),
              length(fixef(m1)$cond) +
                seq_len(length(unlist(ranef(m1)$cond[['scale']]))))
re <- ranef(m1)$cond[['scale']]
cc <- cbind(
  ## fixed effects matrix
  do.call(
    rbind,
    rep(list(diag(ncol(re))), nrow(re))
  ),
  ## random effects matrix
  diag(ncol(re) * nrow(re))
)

sdr <- TMB::sdreport(m1$obj, getJointPrecision=TRUE)
vv <- solve(sdr$jointPrecision)[selector, selector]

cond_vars <- diag(cc %*% vv %*% t(cc))
cond_ses <- cond_vars |>
  sqrt() |>
  matrix(ncol = 2, byrow = TRUE) |>
  as.data.frame() |>
  `dimnames<-`(list(rownames(re), colnames(re)))

coef_pooled <- tibble(
  scale = rownames(re),
  coef_pooled = coef(m1)$cond$scale[['age']],
  se_pooled = cond_ses[['age']]
) |>
  mutate(
    lower_pooled = coef_pooled - qnorm(.975) * se_pooled,
    upper_pooled = coef_pooled + qnorm(.975) * se_pooled
  )
coef_pooled
#> # A tibble: 5 x 5
#>   scale         coef_pooled se_pooled lower_pooled upper_pooled
#>   <chr>               <dbl>     <dbl>        <dbl>        <dbl>
#> 1 agree              0.175     0.0617       0.0539        0.296
#> 2 conscientious      0.0805    0.0506      -0.0188        0.180
#> 3 extraversion       0.0557    0.0504      -0.0431        0.155
#> 4 openness           0.0582    0.0563      -0.0521        0.169
#> 5 stability          0.0736    0.0518      -0.0280        0.175

Created on 2021-07-23 by the reprex package (v2.0.0)

bwiernik commented 3 years ago

The tricky things that would need to be implemented are:

  1. Matching up the fixed and random effects coefficients in the sdr$jointPrecision matrix, which isn't well-labeled.
  2. Handling of generalized models (that could be deferred)
  3. Handling of models with zero-inflation (that could be deferred)