datacamp / rdocumentation-2.0

📚 RDocumentation provides an easy way to search the documentation for every version of every R package on CRAN and Bioconductor.
https://rdocumentation.org
MIT License
282 stars 43 forks source link

function `cohen.d`: Hedge's g uses wrong DFs for one-sample case #155

Open spressi opened 4 months ago

spressi commented 4 months ago

There is a small mistake in the function cohen.d: According to Cumming (2011) p. 294, the degrees of freedom are N-1 for a one-sample design (i.e., if f=NA) and also for a within-subjects design (i.e., if paired=T). Only for the case with two independent samples, the degrees of freedom are N-2. The function, however, also calculates N-2 for the one-sample case (and possibly also for paired=T I have not checked this.

Minimal Reproducible Example: library(tidyverse); iris %>% group_by(Species) %>% summarise( cohen_d = effsize::cohen.d(Sepal.Length, f=NA)$estimate, n = Sepal.Length %>% na.omit() %>% length(), #should not use n() because it doesn't handle NAs correctly hedges_g = effsize::cohen.d(Sepal.Length, NA, hedges.correction=T)$estimate, hedges_g_df1 = cohen_d * (1 - (3 / (4 * (n-1) - 1))), #for one-sample & within: df = N - 1 hedges_g_df2 = cohen_d * (1 - (3 / (4 * (n-2) - 1))), #for two independent samples check_df1 = hedges_g == hedges_g_df1, check_df2 = hedges_g == hedges_g_df2 )

MRE