crsh / papaja

papaja (Preparing APA Journal Articles) is an R package that provides document formats to produce complete APA manuscripts from RMarkdown-files (PDF and Word documents) and helper functions that facilitate reporting statistics, tables, and plots.
https://frederikaust.com/papaja_man/
Other
654 stars 133 forks source link

emmeans objects for certain models with a different type than the default fail in apa_print. #592

Open TheDom42 opened 1 month ago

TheDom42 commented 1 month ago

Describe the bug emmeans objects for certain models with a different type than the default fail in apa_print. Examples that I have identified are type = unlink and type = response (https://cran.r-project.org/web/packages/emmeans/vignettes/transformations.html#tranlink) for Gamma models. There are probably more.

To Reproduce

library("lme4")
#> Loading required package: Matrix
library("broom")
library("emmeans")
#> Welcome to emmeans.
#> Caution: You lose important information if you filter this package's results.
#> See '? untidy'
library("papaja")
#> Loading required package: tinylabels
glmm <- glm(sqrt(breaks) ~ wool*tension, family = Gamma, data = warpbreaks)

#### regular link works
emm.glmm <- emmeans(glmm,  ~ tension)
#> NOTE: Results may be misleading due to involvement in interactions
tidy(emm.glmm, conf.int = TRUE)
#> # A tibble: 3 × 8
#>   tension estimate std.error    df conf.low conf.high statistic  p.value
#>   <chr>      <dbl>     <dbl> <dbl>    <dbl>     <dbl>     <dbl>    <dbl>
#> 1 L          0.172   0.00747    48    0.157     0.187      23.0 1.41e-27
#> 2 M          0.198   0.00856    48    0.181     0.215      23.1 1.13e-27
#> 3 H          0.219   0.00948    48    0.200     0.238      23.1 1.16e-27

apa_print(emm.glmm)
#> $estimate
#> $estimate$L
#> [1] "$1/M = 0.17$, 95\\% CI $[0.16, 0.19]$"
#> 
#> $estimate$M
#> [1] "$1/M = 0.20$, 95\\% CI $[0.18, 0.22]$"
#> 
#> $estimate$H
#> [1] "$1/M = 0.22$, 95\\% CI $[0.20, 0.24]$"
#> 
#> 
#> $statistic
#> $statistic$L
#> [1] "$t(48) = 23.01$, $p < .001$"
#> 
#> $statistic$M
#> [1] "$t(48) = 23.13$, $p < .001$"
#> 
#> $statistic$H
#> [1] "$t(48) = 23.11$, $p < .001$"
#> 
#> 
#> $full_result
#> $full_result$L
#> [1] "$1/M = 0.17$, 95\\% CI $[0.16, 0.19]$, $t(48) = 23.01$, $p < .001$"
#> 
#> $full_result$M
#> [1] "$1/M = 0.20$, 95\\% CI $[0.18, 0.22]$, $t(48) = 23.13$, $p < .001$"
#> 
#> $full_result$H
#> [1] "$1/M = 0.22$, 95\\% CI $[0.20, 0.24]$, $t(48) = 23.11$, $p < .001$"
#> 
#> 
#> $table
#> A data.frame with 6 labelled columns:
#> 
#>   tension estimate     conf.int statistic df p.value
#> L       L     0.17 [0.16, 0.19]     23.01 48  < .001
#> M       M     0.20 [0.18, 0.22]     23.13 48  < .001
#> H       H     0.22 [0.20, 0.24]     23.11 48  < .001
#> 
#> tension  : tension 
#> estimate : $1/M$ 
#> conf.int : 95\\% CI 
#> statistic: $t$ 
#> df       : $\\mathit{df}$ 
#> p.value  : $p$ 
#> attr(,"class")
#> [1] "apa_results" "list"

#### with unlink or other links, this fails
emm.glmm_unlk <- emmeans(glmm,  ~ tension, type = "unlink")
#> NOTE: Results may be misleading due to involvement in interactions
tidy(emm.glmm_unlk, conf.int = TRUE)
#> # A tibble: 3 × 9
#>   tension response std.error    df conf.low conf.high  null statistic  p.value
#>   <chr>      <dbl>     <dbl> <dbl>    <dbl>     <dbl> <dbl>     <dbl>    <dbl>
#> 1 L           5.82     0.253    48     5.35      6.38   Inf      23.0 1.41e-27
#> 2 M           5.05     0.218    48     4.65      5.53   Inf      23.1 1.13e-27
#> 3 H           4.56     0.197    48     4.20      5.00   Inf      23.1 1.16e-27

apa_print(emm.glmm_unlk)
#> Error in apa_num.default(tidy_x$estimate, ...): The parameter 'x' is NULL. Please provide a value for 'x'

Created on 2024-09-17 with reprex v2.1.1

Expected behavior apa_results should have been returned with the column of response as the estimate.

Additional context This links back to https://github.com/crsh/papaja/blob/c045234d41098721189c04bb7489e6d141c01c75/R/apa_print_emm_lsm.R#L131-L136 and in turn the non-standard naming in broom due to the different link type.

Because the tidy_x$estimate cannot be found, this fails.

crsh commented 1 month ago

Hi Dominik,

Thanks for reaching out and pointing me to the origin of the problem. It seems to me this is something that should ideally be addressed in broom, but we could implement a hot fix in papaja until this happens? Would you be willing to give a PR a shot? And could you open an issue on the broom repository about this to see whether a fix on their end is in the cards?

If not, we can take care of this, but it will almost certainly take longer. ;)

Cheers, Frederik

TheDom42 commented 1 month ago

The earliest I would have time to try a PR would be October.

However, before trying the fix, I was also thinking about something else:

https://github.com/crsh/papaja/blob/c045234d41098721189c04bb7489e6d141c01c75/R/apa_print_emm_lsm.R#L454-L465

With this code, the label for the estimate column is decided. It defaults to $M$. Currently, I find it quite hard to imagine a good logic to decide, which label to use for a "generic" response column. This could lead to misinterpretations if the response/DV is not actually expressed as a mean.

crsh commented 1 month ago

I agree that there are cases in which M is not ideal. We could think about a more generic default, such as \hat{y}. It is always possible to overwrite anything we output here:

apa_print(emm.glmm, est_name = "\\hat{y}")