ddsjoberg / gtsummary

Presentation-Ready Data Summary and Analytic Result Tables
http://www.danieldsjoberg.com/gtsummary
Other
1.05k stars 121 forks source link

Feature request: Multiple tests for the same variable on multi-line continuous summary #1882

Closed DanielPark-MGH closed 2 months ago

DanielPark-MGH commented 2 months ago

Is your feature request related to a problem? Please describe. I'd like to perform different tests on the same variable and report the p-values in the corresponding row of a Multi-line Continuous Summary.

Describe the solution you'd like I report the mean and median of one variable in separate rows. The data are grouped into 2 groups. I'd like to perform a t-test and report the p-value in the mean row, and a Wilcoxon rank sum test and report its p-value in the median row.

Describe alternatives you've considered I've used add_stat() and defined a function to return a dataframe with one column of the t-test p-value and Wilcoxon p-value.

ddsjoberg commented 2 months ago

Thanks for the post @DanielPark-MGH . I don't think we'll be adding this as a direct feature in add_p() since it's not generally needed.

But I can see how using add_stat() is cumbersome for this task. Below is an example of how I would approach this kind of table. Note in a a table rendered with gt (the default) you'll have footnotes indicating which p-value is associated with which test.

library(gtsummary)
packageVersion("gtsummary")
#> [1] '2.0.0'
theme_gtsummary_eda()
#> Setting theme "Exploratory Data Analysis"

tbl <- tbl_summary(trial, by = trt, include = age)

# add wilcox test
tbl_wilcox <- tbl |> add_p(test = all_continuous() ~ "wilcox.test")

# add t-test (also hide the primary summary stats)
tbl_ttest <- 
  tbl |> 
  add_p(test = all_continuous() ~ "t.test") |> 
  modify_column_hide(all_stat_cols())

# merge tables together
list(tbl_wilcox, tbl_ttest) |> 
  tbl_merge(tab_spanner = FALSE) |> 
  as_kable() # convert to kable to display on GH
Characteristic Drug A N = 98 Drug B N = 102 p-value p-value
Age 0.7 0.8
Median (Q1, Q3) 46 (37, 60) 48 (39, 56)
Mean (SD) 47 (15) 47 (14)
Min, Max 6, 78 9, 83
Unknown 7 4

Created on 2024-08-02 with reprex v2.1.0

DanielPark-MGH commented 2 months ago

Thanks for the response @ddsjoberg . Here's my workaround; my desired result is different from yours.

tbl <- tbl_summary(
    trial
    by = trt,
    type = all_continuous() ~ "continuous2",
    statistic = age ~ c(
        "{mean} ({sd})",
        "{median} ({p25}, {p75})"
    )
  ) %>%
  add_stat(
    fns = age ~ function(data, variable, by, ...) {
      p.value.t <- t.test(
        formula = as.formula(paste(variable, by, sep = "~")),
        data = data,
        var.equal = TRUE
      )$p.value

      p.value.wilcox <- wilcox.test(
        formula = as.formula(paste(variable, by, sep = "~")),
        data = data
      )$p.value

      data.frame(p.value = c(p.value.t, p.value.wilcox))
    },
    location = everything() ~ "level"
  )

tbl.gt <- as_gt(tbl) %>%
  tab_footnote(
    footnote = "pooled t-test",
    locations = cells_body(
      columns = p.value,
      rows = 2
    )
  ) %>%
  tab_footnote(
    footnote = "Wilcoxon rank sum test",
    locations = cells_body(
      columns = p.value,
      row = 3
    )
  )