easystats / effectsize

:dragon: Compute and work with indices of effect size and standardized parameters
https://easystats.github.io/effectsize/
Other
331 stars 22 forks source link

Confidence interval for eta^2 #611

Closed kazz530 closed 9 months ago

kazz530 commented 9 months ago

Hi, I am very interested in this effectsize package. But the package result of eta^2 that is used for ANOVA context and my manual calculateion does not match. My calculation is based on non-centoral F distribution.
Following is my example code. What is the problem?

data(obk.long, package = "afex")
# modify the data slightly for the demonstration:
obk.long <- obk.long[1:240 %% 3 == 0, ]
obk.long$id <- seq_len(nrow(obk.long))

model <- aov(value ~ treatment, data = obk.long)

N <- nrow(obk.long)
ss <- summary(model)
ss
str(ss)
ss_each <-ss[[1]]$`Sum Sq`
ss_total <- sum(ss_each)
eta2 <- ss_each/ss_total
ss[[1]]$Df
ss[[1]]$`F value`
lambda_LU <- conf.limits.ncf(ss[[1]]$`F value`[1],df.1 = ss[[1]]$Df[1],df.2 = ss[[1]]$Df[2],conf.level = 0.95)
lambda_LU <- c(lambda_LU$Lower.Limit,lambda_LU$Upper.Limit)
eta2_LU <- lambda_LU/(lambda_LU+N)
c(eta2[1],eta2_LU)
print(eta_squared(model,alternative="two.sided",partial=F,ci=0.95),digits = 8)
mattansb commented 9 months ago

Here is a reprex for your example, which compares effectsize with MBESS (both using non-central F distributions):

data(obk.long, package = "afex")
# modify the data slightly for the demonstration:
obk.long <- obk.long[1:240 %% 3 == 0, ]
obk.long$id <- seq_len(nrow(obk.long))

model <- aov(value ~ treatment, data = obk.long)

effectsize::eta_squared(model, alternative = "two.sided", ci = 0.95) |> 
  print(digits = 8)
#> For one-way between subjects designs, partial eta squared is equivalent
#>   to eta squared. Returning eta squared.
#> # Effect Size for ANOVA
#> 
#> Parameter |       Eta2 |                   95% CI
#> -------------------------------------------------
#> treatment | 0.22348040 | [0.07209814, 0.36848290]

A <- anova(model)
MBESS::ci.omega2(F.value = A[1,"F value"],
                 df.1 = A[1,"Df"],
                 df.2 = A[2,"Df"],
                 N = nrow(obk.long))
#> $lower_limit_omega2
#> [1] 0.0695825
#> 
#> $upper_limit_omega2
#> [1] 0.3596343

Created on 2023-09-29 with reprex v2.0.2

These values are very close, I would say the differences are negligible (2-3% difference in magnitude), and can be attributed to differences in tolerances of the function the finds the non-central F values, as well as MBESS using $\eta^2_p=\frac{\lambda}{\lambda + N}$ (as presented in Steiger, 2004) where as effectsize uses $\eta^2_p=\frac{\lambda}{\lambda + df_2}$, which we believe is more appropriate since it follows the actual conversion of the F statistic to $\eta^2_p$.

The examples in Steiger, 2004 are hard to generalize to complex within/mixed/non-balanced designs.

kazz530 commented 9 months ago

Thank you for your quick reply. I have additional questions. How can we confirm the complete document that contains all estimation equations used in your effectsize package? Do you have any theoretical justification paper to use DF2 instead of N?

mattansb commented 9 months ago

The justification is based on the fact that the (exact) F to $\eta^2_p$ conversion is F *d1 / (F *d1 + df2), and so, substituting ncp for F * df1, it would yield ncp / (ncp + df2). Additionally, methods for obtaining CIs for partial correlations (e.g., RVAideMemoire::pcor.test()) do not use N, only df2. Since $\eta^2_p$ is a squared partial correlation, I figure it would be analogous.

But here is a little simulation I ran that seems to suggest that even is multi-factor model in small samples the difference is negligible.

# function to extract CIs from both packages:
MBESS_eta2p_CI <- function(model) {
  A <- anova(model)
  ci <- MBESS::ci.omega2(F.value = A[1,"F value"],
                         df.1 = A[1,"Df"],
                         df.2 = A[nrow(A),"Df"],
                         N = insight::n_obs(model), 
                         conf.level = 0.9)
  unlist(ci)
}

effectsize_eta2p_CI <- function(model) {
  es <- effectsize::eta_squared(model, alternative = "two", ci = 0.9, verbose = FALSE)
  ci <- es[1,c("CI_low", "CI_high")]
  unlist(ci)
}

# Test if a value is within a range
is_between <- function(x, range) {
  range[1] < x && x < range[2]
}

data(obk.long, package = "afex")
obk.long <- obk.long[1:240 %% 3 == 0, ]

model <- aov(value ~ treatment + phase + gender + hour, data = obk.long)

# We will treat the values obtained from this sample as the true population 
# so the TRUE effect size is this:
true_es <- effectsize::eta_squared(model, ci = NULL)[1, 2]

# We will simulate data from the model
sim_model <- function(model) {
  dat <- insight::get_data(model)
  dat[insight::find_response(model)] <- simulate(model, nsim = 1)

  update(model, data = dat)
}

# Results:
set.seed(1234)
res <- replicate(1000, {
  mod <- sim_model(model)

  c(MBESS = is_between(true_es, MBESS_eta2p_CI(mod)),
    effectsize = is_between(true_es, effectsize_eta2p_CI(mod)))
})

mean(res["MBESS",])
#> [1] 0.901
mean(res["effectsize",])
#> [1] 0.902

Both methods obtain the same coverage rates.

bwiernik commented 9 months ago

There is no theoretical justification for N. The distribution of eta^2 follows the F distribution with the appropriate df.

Steiger used a maximum likelihood asymptotic approximation. Using the df is less biased and more accurate

mattansb commented 9 months ago

Thanks @bwiernik - this makes a lot of sense!