Confidence intervals for standardized mean differences in repeated measures

Daiki-Nakamura-git commented 4 months ago

The confidence intervals for the effect size of repeated measures calculated by the rm_d() function do not match the results of other existing functions. For example, the following two outputs have matching point estimates but not matching confidence intervals.

# generate data for repeated measurements
library(mvtnorm)
sigma <- matrix(c(1, 0.5, 0.5, 1), byrow = TRUE, ncol = 2) 　# variance-covariance matrix
mu <- c(1, 2)  # population mean vector
n <- 100       # sample size
set.seed(123)  # fixing random number of seeds
dat <- data.frame(mvtnorm::rmvnorm(n = n, mean = mu, sigma = sigma)) # random numbers following a multivariate normal distributio
colnames(dat) <- c("t1", "t2")  # naming variables
dat$diff <- dat$t2 - dat$t1     # difference score

# effect size : difference score variance d_z
library(effectsize)
effectsize::cohens_d(dat$diff, mu = 0) |> print(digits = 5)
#> Cohen's d |             95% CI
#> ------------------------------
#> 0.95874   | [0.71998, 1.19412]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "z",adjust = F)|> print(digits = 5)
#> d (z)   |             95% CI
#> ----------------------------
#> 0.95874 | [0.72195, 1.19553]

The reason for the discrepancy is that rm_d() calculates confidence intervals by approximating the sample distribution with a normal distribution, while the other existing functions calculate confidence intervals using the non-central distribution method. I consider the non-central distribution method to be more accurate.

In addition, the results of the following functions should be consistent.

# Within-Subject Variance: d_rm
r <- cor(dat$t1, dat$t2) # Pearson's product-moment correlation coefficient
effectsize::cohens_d(dat$diff, mu = 0) * sqrt(2 * (1 - r))  # non-central distribution method
#>   Cohens_d       CI    CI_low  CI_high
#> 1 1.081488 1.071631 0.8121631 1.347004

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "rm", adjust = F)|> print(digits = 6) # normal approximation method
#> d (rm)   |               95% CI
#> -------------------------------
#> 1.081488 | [0.803159, 1.359817]

# Control Variance: d_b
effectsize::glass_delta(dat$t2, dat$t1, adjust = F)|> print(digits = 6) # non-central distribution method
#> Glass' delta |               95% CI
#> -----------------------------------
#> 1.134785     | [0.801919, 1.463171]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "b", adjust = F)|> print(digits = 6) # normal approximation method
#> Becker's d |               95% CI
#> ---------------------------------
#> 1.134785   | [0.863465, 1.406106]

# Average Variance: d_av
effectsize::cohens_d(dat$t2, dat$t1, pooled_sd = F) |> print(digits = 6)  # non-central distribution method
#> Cohen's d |               95% CI
#> --------------------------------
#> 1.082732  | [0.784408, 1.617938]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "av", adjust = F)|> print(digits = 6) # normal approximation method
#> d (av)   |               95% CI
#> -------------------------------
#> 1.082732 | [0.861387, 1.304076]

# All Variance: Just d
effectsize::cohens_d(dat$t2, dat$t1, pooled_sd = T)|> print(digits = 6) # non-central distribution method
#> Cohen's d |               95% CI
#> --------------------------------
#> 1.082732  | [0.784570, 1.378496]

effectsize::rm_d(Pair(t2, t1) ~ 1, data = dat, method = "d", adjust = F)|> print(digits = 6) # normal approximation method
#> Cohen's d |               95% CI
#> --------------------------------
#> 1.082732  | [0.835758, 1.329706]

I would like the confidence intervals to be calculated in the same method consistently within the package.

mattansb commented 4 months ago

Regarding $dz$ and $d{rm}$ - those differences in the CIs are rather small, and are the result of the different methods (ncp vs normal approximation). I don't think we will be able to have all functions across the package consistently use the same CI method - some methods are more well defined for some methods compared to others.

Regarding the supposedly larger deviations between $db$ vs. Glass's $\Delta$ / $d{av}$ vs Unpooled-SD d / "just d" vs d: It would be wrong for the former CIs to match the latter CIs since the repeated measures variants account for the fact that samples are nested within person, and so they should be more narrow (we are more certain about the true value of the effect size).

Bellow I demonstrate this with a bootstrap simulation:

Code

```R library(effectsize) # Bootstrap within ---------------------- sigma <- matrix(c(1, 0.5, 0.5, 1), byrow = TRUE, ncol = 2) 　# variance-covariance matrix mu <- c(1, 2) # population mean vector n <- 100 # sample size set.seed(123) # fixing random number of seeds dat_wide <- data.frame(mvtnorm::rmvnorm(n = n, mean = mu, sigma = sigma)) # random numbers following a multivariate normal distributio colnames(dat_wide) <- c("t1", "t2") # naming variables dat_wide$diff <- dat_wide$t2 - dat_wide$t1 # difference score get_rm_d <- function(rsplit, method) { d <- as.data.frame(rsplit) out <- rm_d(Pair(t1, t2) ~ 1, data = d, ci = NULL, adjust = FALSE, method = method) out[[1]] } boot_wide <- rsample::bootstraps(dat_wide, times = 200) boot_wide$Beckers_d <- sapply(boot_wide$splits, get_rm_d, method = "b") boot_wide$d_av <- sapply(boot_wide$splits, get_rm_d, method = "av") boot_wide$just_d <- sapply(boot_wide$splits, get_rm_d, method = "d") # Bootstrap between ------------------- dat_long <- datawizard::data_to_long(dat_wide[-3], cols = c("t1", "t2"), names_to = "time") get_glass_delta <- function(rsplit) { d <- as.data.frame(rsplit) out <- glass_delta(value ~ time, data = d, ci = NULL) out[[1]] } get_d_unpooled <- function(rsplit) { d <- as.data.frame(rsplit) out <- cohens_d(value ~ time, data = d, pooled_sd = FALSE, ci = NULL) out[[1]] } get_just_d_between <- function(rsplit) { d <- as.data.frame(rsplit) out <- cohens_d(value ~ time, data = d, ci = NULL) out[[1]] } boot_long <- rsample::bootstraps(dat_long, times = 200) boot_long$Delta <- sapply(boot_long$splits, get_glass_delta) boot_long$d_unpooled <- sapply(boot_long$splits, get_d_unpooled) boot_long$just_d <- sapply(boot_long$splits, get_just_d_between) # Compare ------------------- library(ggplot2) library(ggdist) library(patchwork) p_within <- boot_wide[3:5] |> datawizard::data_to_long(names_to = "d_type") |> datawizard::data_modify( d_type = factor(d_type, levels = c("Beckers_d", "d_av", "just_d")) ) |> ggplot(aes(value, d_type)) + stat_slab(aes(fill_ramp = after_stat(level), fill = d_type), .width = c(0.5, 0.95, 1), point_interval = "mean_qi", color = "grey") + stat_spike(at = c(-1.30, -0.77), data = \(data) subset(data, d_type == "Beckers_d")) + stat_spike(at = c(-1.30, -0.86), data = \(data) subset(data, d_type == "d_av")) + stat_spike(at = c(-1.33, -0.84), data = \(data) subset(data, d_type == "just_d")) + scale_thickness_shared() + theme_ggdist() p_between <- boot_long[3:5] |> datawizard::data_to_long(names_to = "d_type") |> datawizard::data_modify( d_type = factor(d_type, levels = c("Delta", "d_unpooled", "just_d")) ) |> ggplot(aes(value, d_type)) + stat_slab(aes(fill_ramp = after_stat(level), fill = d_type), .width = c(0.5, 0.95, 1), point_interval = "mean_qi", color = "grey") + stat_spike(at = c(-1.33, -0.73), data = \(data) subset(data, d_type == "Delta")) + stat_spike(at = c(-1.38, -0.78), data = \(data) subset(data, d_type == "d_unpooled")) + stat_spike(at = c(-1.38, -0.78), data = \(data) subset(data, d_type == "just_d")) + scale_thickness_shared() + theme_ggdist() wrap_plots(p_within, p_between, ncol = 1, guides = "collect") & coord_cartesian(xlim = c(-1.5, -0.6)) & scale_fill_hue("Type", labels = c("Baseline", "Average SD", "Just d")) & labs(y = NULL) ```

The slabs are the bootstrapped sampling distribution of the effect size (top - bootstrap accounting for nesting; bottom - bootstrap not accounting for nesting). The black spikes mark the CIs as returned by the various {effectsize} functions.

We can see:

The sampling distribution for the rm_d() variant are more narrow than the cohens_d() / glass_delta(),
The 95% CI $_{boot}$ match the CIs estimated by the various {effectsize} functions.

Daiki-Nakamura-git commented 4 months ago

Thanks for your reply.

My point about d_b, d_av, and "just d" was incorrect, as your simulation results show. I did not take the correlations into account.

I would prefer to give users a choice of which CI method to use for d_z and d_rm, but I can respect your idea of using different methods for different functions.

easystats / effectsize

Confidence intervals for standardized mean differences in repeated measures #631