crsh / papaja

papaja (Preparing APA Journal Articles) is an R package that provides document formats to produce complete APA manuscripts from RMarkdown-files (PDF and Word documents) and helper functions that facilitate reporting statistics, tables, and plots.
https://frederikaust.com/papaja_man/
Other
657 stars 133 forks source link

Support custom effect sizes #423

Open crsh opened 4 years ago

crsh commented 4 years ago

It has been my stated goal from the start that I would like to do no computing if possible---just formatting and typesetting. Unfortunately, many analysis objects do not include all desired effect size estimates. Because of this, many methods report only unstandardized effect sizes or end up doing some computations. Some recent internal changes provide an avenue to change this. It would be nice if users could pass a function to apa_print() that is used internally to calculate the desired effect size and confidence interval and include it in the output (or a precalculated data.frame). This way, it would be much easier to support, for example, standardized effect sizes. Here's a rough prototype:

library("dplyr")

apa_print.afex_aov <- function(x, es = effectsize::omega_squared) {
  aov_table <- apa_print(x, es = NULL)$table

  if(is.function(es)) {
    aov_es_table <- es(x)
  }

  aov_es_table <- tidy_es(aov_es_table)

  aov_es_table <- aov_es_table %>%
    mutate(estimate = printnum(estimate, digits = 3, gt1 = FALSE)) %>% 
    rowwise() %>% 
    mutate(conf.int = print_confint(conf.low, conf.high, digits = 3, gt1 = FALSE)) %>% 
    select(estimate, conf.int) %>% 
    label_variables(
      estimate = "\\hat\\eta^2_p"
      , conf.int = "90% CI"
    )

  aov_table$estimate <- aov_es_table$estimate
  aov_table$conf.int <- aov_es_table$conf.int

  # Assembly of the strings for reporting follows
  # ...
}

tidy_es <- function(x) {
  UseMethod("tidy_es", x)
}

tidy_es.default <- function(x) {
  tryCatch(
    broom::tidy(x)
    , error = function(e) {
      stop("The object returned by the function passed to 'es' could not be tidied, that is converted to a data.frame with columns 'estimate', 'conf.low', and 'conf.high'. Please supply a function that returns a data.frame of this structure or pass such a data.frame directly.")
    }
  )
}

tidy_es.data.frame <- function(x) {
  es_cols <- colnames(x)
  if(
    !"estimate" %in% es_cols |
    !"conf.low" %in% es_cols |
    !"conf.high" %in% es_cols
  ) {
    stop("Effect size data.frame must contain the colums 'estimate', 'conf.low', and 'conf.high'")
  }

  x
}

tidy_es.effectsize_table <- function(x) {
  insight::standardize_names(x, style = "broom")
}

tidy_es.parameters_model <- function(x) {
  tidy_es.effectsize_table(x)
}

tidy_es.parameters_distribution <- function(x) {
  tidy_es.effectsize_table(x)
}
crsh commented 4 years ago

Check out the lm methods for user interface considerations regarding measures of R squared and standardized coefficients.

mariusbarth commented 4 years ago

I've just pushed a draft on branch add-custom-effect-sizes. I ran into some obstacles that need further consideration. However, first the good news:

I considerably "flattened" our ANOVA processing: I could retire print_anova() and the arrange_anova.anova() method by switching to canonize(), beautify(), and glue_apa_results(). This will allow us to further simplify apa_print.anova(), but I first postponed this step.

Regarding the effectsize package:

1) I started calculating effect sizes with effectsize package for afex_aov objects, because these are best supported by effectsize. 2) effectsize::eta_squared() does not return effect sizes for intercept terms. 3) Many ANOVA outputs that we support are not supported by effectsize. Therefore, we will have to adopt defaults method-by-method and we will have to keep our effect-size calculation code. Maybe we only support effectsize for afex_aov objects?

There are still many loose ends here (with regard to function defaults, etc.), which is why I didn't create a PR. Think it's worth a thought if this is really a fruitful avenue, especially because we cannot abandon our old eta-squared code. We will get another dependency with the only benefit that there will be CIs for some effect-size measures.

crsh commented 3 years ago

To document our discussion:

  1. We want to add support for effectsize and similar functions as an option. The default will remain as it is now, returning unstandardized effect sizes as available from the models.
  2. This implies that effectsize will be a suggested package, and we will try to make it possible to use a global option to change the default to standardized effect sizes via effectsize if the package is available.
  3. There is the issue that summary-output is currently not supported in effectsize, but we will try to get around this issue by using the call to get the original object (see lm/glm code).
  4. I have initiated a discussion regarding the effect size calculation for intercept terms in ANOVA model: https://github.com/easystats/effectsize/issues/156
mariusbarth commented 3 years ago

Moving forward with integrating the effectsize package, I think it will be the best solution to delegate calculating estimates together with confidence intervals to a specialized function in all possible instances, which can then be from the effectsize package. For some statistical models, however, we will most likely want original-scale estimates as a default, and therefore we need a function that simply returns original-scale estimates together with confidence intervals. This doesn't seem to be the scope of the effectsize package (or I overlooked something), so maybe we integrate it in papaja.

Here is a draft for such a function:

original_scale <- function(x, ...) {
  UseMethod("original_scale")
}

original_scale.default <- function(x, conf.int = .95, ...) {

  if(isTRUE(conf.int)) {
    conf.int <- .95
  } else {
    conf.int <- as.numeric(conf.int[[1L]])
  }

  coefs <- stats::coefficients(x)

  y <- data.frame(
    term = names(coefs)
    , estimate = unname(coefs)
    , stringsAsFactors = FALSE
  )

  if(as.logical(conf.int)) {
    conf_out <- stats::confint(x, level = conf.int, ...)
    y$conf.low <- conf_out[, 1L, drop = TRUE]
    y$conf.high <- conf_out[, 2L, drop = TRUE]
  }
  class(y) <- c("papaja_estimates", "data.frame")
  attr(y, "estimate")   <- "original_scale"
  attr(y, "conf.level") <- conf.int
  y
}

test_lm <- lm(yield ~ N, npk)
test_glm <- glm(as.numeric(N) ~ yield, npk, family = poisson)
original_scale(test_lm)
original_scale(test_glm)
shirdekel commented 2 years ago

That's for adding custom effect sizes! I'd like to use this functionality with emmGrid objects. Has this been added yet?