easystats / parameters

:bar_chart: Computation and processing of models' parameters
https://easystats.github.io/parameters/
GNU General Public License v3.0
439 stars 36 forks source link

Calculating by-group estimates and CIs for a random intercept grouping variable in `lmer()` #894

Open OmidGhasemi21 opened 1 year ago

OmidGhasemi21 commented 1 year ago

Hi there,

I am trying to extract by-group marginal means and their CIs of an lmer() model with group as a random effect. For example:

library(lme4)
library(tidyverse)
library(parameters)

data <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv") %>%
  drop_na()

model <- lmer(flipper_length_mm ~ bill_depth_mm + (bill_depth_mm | species),
              data = data)

I can access the bill_depth_mm estimate for each species by using coef(model), but I was wondering if it is possible to get the CIs with parameters. Reading the tutorials, I realized that I can get SEs using:

standard_error(model, effects= "random")

I was wondering if these are SEs for the by-group estimates and if calculating CIs manually makes sense here. Is there any function in parameters that I am missing for calculating CIs?

bwiernik commented 1 year ago

Obtaining those standard errors is difficult because it requires knowing the correlation between the fixed effects and random effects estimates.

Because of the algebraic tricks that lme4 uses, getting those correlations isn't really possible.

In glmmTMB, getting those correlations is feasible, but hasn't yet been implemented. See https://github.com/glmmTMB/glmmTMB/issues/691

So, neither of packages currently provides a way to get these standard errors. If uncertainty for coef() is needed, I suggest using the brms package instead.

mattansb commented 1 year ago

@bwiernik can lme4::bootMer(use.u=TRUE) be used?

library(lme4)

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

get_randint <- function(mod) {
  RE <- coef(mod)[[1]]
  setNames(RE[["(Intercept)"]], nm = rownames(RE))
}

boot_res <- bootMer(fm1, FUN = get_randint, nsim = 599, use.u = TRUE)

confint(boot_res)
#>        2.5 %   97.5 %
#> 308 239.9207 281.6592
#> 309 194.1790 236.5172
#> 310 196.8933 239.6700
#> 330 245.2556 283.7539
#> 331 246.0527 288.3115
#> 332 239.8875 275.8517
#> 333 247.3167 282.7469
#> 334 228.4844 262.1989
#> 335 219.6058 265.8239
#> 337 261.5971 304.9581
#> 349 212.0846 252.6489
#> 350 226.4115 269.0781
#> 351 233.6814 272.5746
#> 352 250.7109 288.1504
#> 369 233.9859 272.8441
#> 370 214.5085 255.1081
#> 371 233.9799 271.1498
#> 372 244.7220 281.5601
strengejacke commented 1 year ago

You can set group_level = TRUE:

library(lme4)
#> Loading required package: Matrix
library(tidyverse)
library(parameters)

data <- read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv") %>%
  drop_na()
#> New names:
#> • `` -> `...1`
#> Rows: 344 Columns: 9
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): species, island, sex
#> dbl (6): ...1, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

model <- lmer(flipper_length_mm ~ bill_depth_mm + (bill_depth_mm | species),
              data = data)

model_parameters(model, group_level = TRUE)
#> # Fixed Effects
#> 
#> Parameter     | Coefficient |   SE |           95% CI | t(327) |      p
#> -----------------------------------------------------------------------
#> (Intercept)   |      146.84 | 8.31 | [130.49, 163.20] |  17.66 | < .001
#> bill depth mm |        3.24 | 0.92 | [  1.43,   5.05] |   3.52 | < .001
#> 
#> # Random Effects: species
#> 
#> Parameter                 | Coefficient |   SE |          95% CI
#> ----------------------------------------------------------------
#> (Intercept) [Adelie]      |        9.58 | 6.00 | [ -2.17, 21.33]
#> (Intercept) [Chinstrap]   |       -8.81 | 7.98 | [-24.45,  6.82]
#> (Intercept) [Gentoo]      |       -0.77 | 6.52 | [-13.54, 12.01]
#> bill_depth_mm [Adelie]    |       -1.40 | 0.33 | [ -2.04, -0.76]
#> bill_depth_mm [Chinstrap] |       -0.10 | 0.43 | [ -0.95,  0.75]
#> bill_depth_mm [Gentoo]    |        1.50 | 0.43 | [  0.65,  2.36]
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald t-distribution approximation.

Created on 2023-08-24 with reprex v2.0.2

bwiernik commented 1 year ago

@strengejacke Those are the group-level random effects estimates (as returned by ranef()), not the group-level total effects estimates (fixed + random, as returned by coef()).

@mattansb yes, you could use bootMer() here:

library(lme4)
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy))
coef(fm1)
boot1 <- bootMer(fm1, \(x) coef(x)$Subject[, "Days"], use.u = FALSE, nsim = 100)
cbind(boot1$t0, t(apply(boot1$t, 2, quantile, probs = c(.025, .975))))
DominiqueMakowski commented 1 year ago

purpose of: https://easystats.github.io/modelbased/reference/estimate_grouplevel.html

mattansb commented 1 year ago

@bwiernik I thought I'd want use.u = TRUE here - can you explain? 🙏

OmidGhasemi21 commented 1 year ago

Hi all,

I really appreciate your help.

@bwiernik, my original plan is to use brms, but I thought it would be good to have a frequentist study to make sure that the results are robust and stay the same with lmer.

I did not understand @bwiernik codes, but with @mattansb codes, I calculated CIs for group-level estimates and the results are a bit weird (not sure if I did it correctly).

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

get_randint <- function(mod) {
  RE <- coef(mod)[[1]]
  setNames(RE[["Days"]], nm = rownames(RE)) # I changed (Intercept) to Days here
}

boot_res <- bootMer(fm1, FUN = get_randint, nsim = 599, use.u = TRUE)

boot_res %>%
  confint() %>%
  data.frame() %>%
  rownames_to_column("Subject") %>% 
  rename(conf.low = X2.5.., conf.high = X97.5..) %>%
  left_join(., coef(fm1)$Subject %>%
              data.frame() %>%
              rownames_to_column("Subject") %>%
              select(estimate = Days, Subject), by="Subject") %>% 
  ggplot(aes(x = estimate, y = Subject)) +
  geom_pointrange(aes(x = estimate, y = Subject, xmin= conf.low, xmax= conf.high)) +
  geom_vline(xintercept = 0, linetype = "dashed", size = .2)+
  theme_bw()

Here is the result:

Screenshot 2023-08-25 at 9 12 56 am

Should not the estimates of coef() be the midpoint of CI?

And speaking of comparing Bayesian and Frequentist analyses, is it normal that CIs are narrower than HDIs?

Screenshot 2023-08-25 at 9 31 07 am
bwiernik commented 1 year ago

Bootstrapped CIs aren't necessarily symmetrical (one of the advantages of bootstrap is that it can accurately estiamte sampling distributions that are non-normal), but I would expect these intervals to be roughly symmetrical, so not exactly sure why that's not the case here. It is consistent when I increase the number of nsim and set a different random seed, so it's not just monte carlo error.

Personally, I would trust the HMC-based intervals given by brms over these parametric bootstrap intervals from bootMer() and, in my experience, the random effects estimates from brms are more plausible. (I find that lmer() seems to overly shrink values across groups in my work.) So I would recommend using brms in general for this type of inference—the default priors are essentially non-informative and so the default brms estimates have very good frequentist properties.

(One note, you generally want to specify type = "perc" when you use confint() bootstraps.)

library(lme4)
library(tidyverse)

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

get_randslope <- function(mod) {
  RE <- coef(mod)[[1]]
  setNames(RE[["Days"]], nm = rownames(RE)) # I changed (Intercept) to Days here
}

boot_res <- bootMer(fm1, FUN = get_randslope, nsim = 5000, use.u = TRUE, seed = 33939292)

boot_res %>%
  confint(type = "perc") %>%
  as.data.frame() %>%
  rownames_to_column("Subject") %>% 
  rename(CI_low = "2.5 %", CI_high = "97.5 %") %>%
  left_join(enframe(boot_res$t0, name = "Subject", value = "Estimate"), ., by = "Subject") %>% 
  ggplot() +
  aes(x = Estimate, y = Subject, xmin = CI_low, xmax = CI_high) +
  geom_pointrange() +
  geom_vline(xintercept = 0, linetype = "dashed", linewidth = .2)+
  theme_bw()
bwiernik commented 1 year ago

@bwiernik I thought I'd want use.u = TRUE here - can you explain? 🙏

Yes you would want use.u = TRUE. My bad

bbolker commented 6 months ago

Updating. These are probably available now due to improvements in lme4 (predict(..., se.fit = TRUE))