easystats / modelbased

:chart_with_upwards_trend: Estimate effects, contrasts and means based on statistical models
https://easystats.github.io/modelbased/
GNU General Public License v3.0
234 stars 19 forks source link

Add effect size (Cohen's d) to `estimate_contrasts` #227

Open rempsyc opened 1 year ago

rempsyc commented 1 year ago

As discussed in easystats/report#372

library(modelbased)

model <- lm(Sepal.Width ~ Species, data = iris)
estimate_contrasts(model)
#> No variable was specified for contrast estimation. Selecting `contrast = "Species"`.
#> Marginal Contrasts Analysis
#> 
#> Level1     |     Level2 | Difference |         95% CI |   SE | t(147) |      p | Cohen's d | Cohens_d 95% CI
#> ------------------------------------------------------------------------------------------------------------
#> setosa     | versicolor |       0.66 | [ 0.49,  0.82] | 0.07 |   9.69 | < .001 |      1.89 |  [ 1.41,  2.36]
#> setosa     |  virginica |       0.45 | [ 0.29,  0.62] | 0.07 |   6.68 | < .001 |      1.29 |  [ 0.86,  1.72]
#> versicolor |  virginica |      -0.20 | [-0.37, -0.04] | 0.07 |  -3.00 | 0.003  |     -0.64 |  [-1.04, -0.24]
#> 
#> Marginal contrasts estimated at Species
#> p-value adjustment method: Holm (1979)

model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris)
estimate_contrasts(model)
#> No variable was specified for contrast estimation. Selecting `contrast = "Species"`.
#> NOTE: Results may be misleading due to involvement in interactions
#> Marginal Contrasts Analysis
#> 
#> Level1     |     Level2 | Difference |        95% CI |   SE | t(144) |      p | Cohen's d | Cohens_d 95% CI
#> -----------------------------------------------------------------------------------------------------------
#> setosa     | versicolor |       1.59 | [ 0.64, 2.54] | 0.39 |   4.04 | < .001 |      1.89 |  [ 1.41,  2.36]
#> setosa     |  virginica |       1.77 | [ 0.77, 2.78] | 0.41 |   4.29 | < .001 |      1.29 |  [ 0.86,  1.72]
#> versicolor |  virginica |       0.18 | [-0.17, 0.54] | 0.15 |   1.27 | 0.205  |     -0.64 |  [-1.04, -0.24]
#> 
#> Marginal contrasts estimated at Species
#> p-value adjustment method: Holm (1979)

data <- mtcars
data$cyl <- as.factor(data$cyl)
data$am <- as.factor(data$am)
model <- rstanarm::stan_glm(mpg ~ cyl * am, data = data, refresh = 0)
estimate_contrasts(model)
#> No variable was specified for contrast estimation. Selecting `contrast = "cyl"`.
#> NOTE: Results may be misleading due to involvement in interactions
#> Marginal Contrasts Analysis
#> 
#> Level1 | Level2 | Difference |        95% CI |     pd | % in ROPE | Cohen's d | Cohens_d 95% CI
#> -----------------------------------------------------------------------------------------------
#> cyl4   |   cyl6 |       5.62 | [2.31,  8.85] | 99.95% |        0% |      1.88 |    [0.72, 3.01]
#> cyl4   |   cyl8 |      10.23 | [7.06, 13.48] |   100% |        0% |      3.26 |    [2.02, 4.48]
#> cyl6   |   cyl8 |       4.67 | [1.34,  8.00] | 99.62% |        0% |      2.05 |    [0.91, 3.14]
#> 
#> Marginal contrasts estimated at cyl

Created on 2023-05-07 with reprex v2.0.2

Though it will skip it edge cases where the following arguments are not NULL: contrast, fixed, and at, because those made it hard to compute Cohen's d, which right now fetches the original data to compute it. I don't know if the contrasts modified data is available anywhere so we could calculate proper effect sizes for those cases?

For now though I guess it is still a minor improvement to no effect sizes at all.


DominiqueMakowski commented 1 year ago

thanks! and do add yourself in the authors desc

codecov[bot] commented 1 year ago

Codecov Report

Merging #227 (3658fa1) into main (052f73e) will decrease coverage by 0.84%. Report is 1 commits behind head on main. The diff coverage is 10.00%.

:exclamation: Current head 3658fa1 differs from pull request most recent head 7ebdc3c. Consider uploading reports for the commit 7ebdc3c to get more accurate results

@@            Coverage Diff             @@
##             main     #227      +/-   ##
==========================================
- Coverage   35.10%   34.27%   -0.84%     
==========================================
  Files          25       25              
  Lines        1168     1208      +40     
==========================================
+ Hits          410      414       +4     
- Misses        758      794      +36     
Files Changed Coverage Δ
R/estimate_contrasts.R 56.32% <10.00%> (-39.43%) :arrow_down:
rempsyc commented 1 year ago

Current reprexes:

library(modelbased)

model <- lm(Sepal.Width ~ Species, data = iris)

estimate_contrasts(model)
#> No variable was specified for contrast estimation. Selecting `contrast = "Species"`.
#> Marginal Contrasts Analysis
#> 
#> Level1     |     Level2 | Difference |         95% CI |   SE | t(147) |      p
#> ------------------------------------------------------------------------------
#> setosa     | versicolor |       0.66 | [ 0.49,  0.82] | 0.07 |   9.69 | < .001
#> setosa     |  virginica |       0.45 | [ 0.29,  0.62] | 0.07 |   6.68 | < .001
#> versicolor |  virginica |      -0.20 | [-0.37, -0.04] | 0.07 |  -3.00 | 0.003 
#> 
#> Marginal contrasts estimated at Species
#> p-value adjustment method: Holm (1979)

estimate_contrasts(model, effectsize = "cohens_d")
#> No variable was specified for contrast estimation. Selecting `contrast = "Species"`.
#> Unsupported effect size 'cohens_d', returning none.
#> Marginal Contrasts Analysis
#> 
#> Level1     |     Level2 | Difference |         95% CI |   SE | t(147) |      p
#> ------------------------------------------------------------------------------
#> setosa     | versicolor |       0.66 | [ 0.49,  0.82] | 0.07 |   9.69 | < .001
#> setosa     |  virginica |       0.45 | [ 0.29,  0.62] | 0.07 |   6.68 | < .001
#> versicolor |  virginica |      -0.20 | [-0.37, -0.04] | 0.07 |  -3.00 | 0.003 
#> 
#> Marginal contrasts estimated at Species
#> p-value adjustment method: Holm (1979)

estimate_contrasts(model, effectsize = "emmeans")
#> No variable was specified for contrast estimation. Selecting `contrast = "Species"`.
#> Marginal Contrasts Analysis
#> 
#> Level1     |     Level2 | Difference |         95% CI |   SE | t(147) |      p | effect_size |      es 95% CI
#> -------------------------------------------------------------------------------------------------------------
#> setosa     | versicolor |       0.66 | [ 0.49,  0.82] | 0.07 |   9.69 | < .001 |        1.94 | [ 1.48,  2.39]
#> setosa     |  virginica |       0.45 | [ 0.29,  0.62] | 0.07 |   6.68 | < .001 |        1.34 | [ 0.91,  1.76]
#> versicolor |  virginica |      -0.20 | [-0.37, -0.04] | 0.07 |  -3.00 | 0.003  |       -0.60 | [-1.00, -0.20]
#> 
#> Marginal contrasts estimated at Species
#> p-value adjustment method: Holm (1979)

set.seed(100)
estimate_contrasts(model, effectsize = "bootES")
#> No variable was specified for contrast estimation. Selecting `contrast = "Species"`.
#> Marginal Contrasts Analysis
#> 
#> Level1     |     Level2 | Difference |         95% CI |   SE | t(147) |      p | cohens.d | cohens.d 95% CI
#> -----------------------------------------------------------------------------------------------------------
#> setosa     | versicolor |       0.66 | [ 0.49,  0.82] | 0.07 |   9.69 | < .001 |     1.94 |  [ 1.52,  2.39]
#> setosa     |  virginica |       0.45 | [ 0.29,  0.62] | 0.07 |   6.68 | < .001 |     1.34 |  [ 0.82,  1.72]
#> versicolor |  virginica |      -0.20 | [-0.37, -0.04] | 0.07 |  -3.00 | 0.003  |    -0.60 |  [-0.92, -0.22]
#> 
#> Marginal contrasts estimated at Species
#> p-value adjustment method: Holm (1979)

set.seed(100)
estimate_contrasts(model, effectsize = "bootES", bootES_type = "akp.robust.d")
#> No variable was specified for contrast estimation. Selecting `contrast = "Species"`.
#> Marginal Contrasts Analysis
#> 
#> Level1     |     Level2 | Difference |         95% CI |   SE | t(147) |      p | akp.robust.d | akp.robust.d 95% CI
#> -------------------------------------------------------------------------------------------------------------------
#> setosa     | versicolor |       0.66 | [ 0.49,  0.82] | 0.07 |   9.69 | < .001 |         1.88 |      [ 1.42,  2.49]
#> setosa     |  virginica |       0.45 | [ 0.29,  0.62] | 0.07 |   6.68 | < .001 |         1.37 |      [ 0.92,  1.78]
#> versicolor |  virginica |      -0.20 | [-0.37, -0.04] | 0.07 |  -3.00 | 0.003  |        -0.51 |      [-0.90, -0.17]
#> 
#> Marginal contrasts estimated at Species
#> p-value adjustment method: Holm (1979)

Created on 2023-07-01 with reprex v2.0.2

For the emmeans effect size, I just called it effect_size, because I don't know if we can call it Cohen's d (or something else).

I updated the to-do list at the top post with the missing effect sizes

rempsyc commented 1 year ago

Alright, I added a new method, "marginal", based on Brenton's formula. That said,

library(modelbased)

mtcars2 <- mtcars
mtcars2$cyl <- as.factor(mtcars2$cyl)

model <- lm(mpg ~ cyl + wt * hp, mtcars2)

d values from method “marginal”,

estimate_contrasts(model, effectsize = "marginal")
#> No variable was specified for contrast estimation. Selecting `contrast = "cyl"`.
#> Marginal Contrasts Analysis
#> 
#> Level1 | Level2 | Difference |        95% CI |   SE | t(26) |      p | marginal_d
#> ---------------------------------------------------------------------------------
#> cyl4   |   cyl6 |       1.26 | [-2.55, 5.07] | 1.49 |  0.85 | > .999 |       0.19
#> cyl4   |   cyl8 |       1.45 | [-3.83, 6.74] | 2.06 |  0.70 | > .999 |       0.22
#> cyl6   |   cyl8 |       0.20 | [-3.64, 4.03] | 1.50 |  0.13 | > .999 |       0.03
#> 
#> Marginal contrasts estimated at cyl
#> p-value adjustment method: Holm (1979)

differ a lot from "emmeans",

estimate_contrasts(model, effectsize = "emmeans")
#> No variable was specified for contrast estimation. Selecting `contrast = "cyl"`.
#> Marginal Contrasts Analysis
#> 
#> Level1 | Level2 | Difference |        95% CI |   SE | t(26) |      p | effect_size |     es 95% CI
#> --------------------------------------------------------------------------------------------------
#> cyl4   |   cyl6 |       1.26 | [-2.55, 5.07] | 1.49 |  0.85 | > .999 |        0.57 | [-0.83, 1.97]
#> cyl4   |   cyl8 |       1.45 | [-3.83, 6.74] | 2.06 |  0.70 | > .999 |        0.66 | [-1.27, 2.60]
#> cyl6   |   cyl8 |       0.20 | [-3.64, 4.03] | 1.50 |  0.13 | > .999 |        0.09 | [-1.31, 1.49]
#> 
#> Marginal contrasts estimated at cyl
#> p-value adjustment method: Holm (1979)

Although they seem consistent. Did I get the formula right? I used

R2_cov <- summary(model)$r.squared
d_adj <- contrasts$t * contrasts$SE / sigma(model) * sqrt(1 - R2_cov)

Created on 2023-07-02 with reprex v2.0.2

rempsyc commented 1 year ago

Method marginal (should we use a different name, like marginal_d?) for now also does not have confidence intervals. I know that easystats/effectsize#351 has them in the large function, but I wasn't sure if I should implement them here in modelbased or wait for it to be formally implemented in effectsize (since the issue belongs to that repo). At least for now we have an easy way to get the marginal d-value itself.

rempsyc commented 1 year ago

@bwiernik if you think this looks good after reading my previous three comments above, then I think we'll be ready to think about merging this. Anything else you would like me to implement (e.g., see from list from original post)?