ddsjoberg / gtsummary

Presentation-Ready Data Summary and Analytic Result Tables
http://www.danieldsjoberg.com/gtsummary
Other
1.04k stars 116 forks source link

Bug Report: (with potential fix) Saving a gtsummary gt to docx fails when style_pvalue produces "<0.001" #1539

Closed leslem closed 1 year ago

leslem commented 1 year ago

I don't have a reprex, but I do have the beginnings of a solution.

When saving a gtsummary to docx, I ran into this intermittent error:

Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html,  : 
  StartTag: invalid element name [68]

I was able to determine that this error only happens when the p values contained very low p's that were getting reformatted to "<0.001" by style_pvalue.

mygt <- mydata %>%
  tbl_summary(by = treatment,
                         missing_text = "Missing",
                         statistic=list(all_continuous() ~ "{median} ({p25}, {p50}) n={N_nonmiss}")) %>% 
  add_p(test = all_continuous() ~ "kruskal.test")
# This fails when p values are <0.001
gt::gtsave(as_gt(mygt), "summary_table.docx")

I was able to fix it by making my own p value styler function and replacing the "<" with the html code "&#60;".

mygt <- mydata %>%
  tbl_summary(by = treatment,
                         missing_text = "Missing",
                         statistic=list(all_continuous() ~ "{median} ({p25}, {p50}) n={N_nonmiss}")) %>% 
  add_p(test = all_continuous() ~ "kruskal.test",
              pvalue_fun = style_pvalue_custom)
# This works when p values are <0.001
gt::gtsave(as_gt(mygt), "summary_table.docx")
style_pvalue_custom <- function (x,
                                  digits = 1,
                                  prepend_p = FALSE,
                                  big.mark = NULL,
                                  decimal.mark = NULL,
                                  ...)
{
  if (digits == 1) {
    p_fmt <- case_when(
      x > 1 + 1e-15 ~ NA_character_,
      x < 0 - 1e-15 ~ NA_character_,
      x > 0.9 ~ paste0(
        ">",
        style_number(
          x = 0.9,
          digits = 1,
          big.mark = big.mark,
          decimal.mark = decimal.mark,
          ...
        )
      ),
      gtsummary:::round2(x, 1) >= 0.2 ~ style_number(
        x,
        digits = 1,
        big.mark = big.mark,
        decimal.mark = decimal.mark,
        ...
      ),
      gtsummary:::round2(x, 2) >= 0.1 ~ style_number(
        x,
        digits = 2,
        big.mark = big.mark,
        decimal.mark = decimal.mark,
        ...
      ),
      x >= 0.001 ~ style_number(
        x,
        digits = 3,
        big.mark = big.mark,
        decimal.mark = decimal.mark,
        ...
      ),
      x < 0.001 ~ paste0(
        "&#60;",
        style_number(
          x = 0.001,
          digits = 3,
          big.mark = big.mark,
          decimal.mark = decimal.mark,
          ...
        )
      )
    )
  }
  else if (digits == 2) {
    p_fmt <- case_when(
      x > 1 + 1e-15 ~ NA_character_,
      x <
        0 - 1e-15 ~ NA_character_,
      x > 0.99 ~ paste0(
        ">",
        style_number(
          x = 0.99,
          digits = 2,
          big.mark = big.mark,
          decimal.mark = decimal.mark,
          ...
        )
      ),
      gtsummary:::round2(x,
                         2) >= 0.1 ~ style_number(
                           x,
                           digits = 2,
                           big.mark = big.mark,
                           decimal.mark = decimal.mark,
                           ...
                         ),
      x >= 0.001 ~ style_number(
        x,
        digits = 3,
        big.mark = big.mark,
        decimal.mark = decimal.mark,
        ...
      ),
      x < 0.001 ~ paste0(
        "&#60;",
        style_number(
          x = 0.001,
          digits = 3,
          big.mark = big.mark,
          decimal.mark = decimal.mark,
          ...
        )
      )
    )
  }
  else if (digits == 3) {
    p_fmt <- case_when(
      x > 1 + 1e-15 ~ NA_character_,
      x <
        0 - 1e-15 ~ NA_character_,
      x > 0.999 ~ paste0(
        ">",
        style_number(
          x = 0.999,
          digits = 3,
          big.mark = big.mark,
          decimal.mark = decimal.mark,
          ...
        )
      ),
      x >= 0.001 ~
        style_number(
          x,
          digits = 3,
          big.mark = big.mark,
          decimal.mark = decimal.mark,
          ...
        ),
      x < 0.001 ~
        paste0(
          "&#60;",
          style_number(
            x = 0.001,
            digits = 3,
            big.mark = big.mark,
            decimal.mark = decimal.mark,
            ...
          )
        )
    )
  }
  else {
    stop("The `digits=` argument must be 1, 2, or 3.")
  }
  if (prepend_p == TRUE) {
    p_fmt <-
      case_when(
        is.na(p_fmt) ~ NA_character_,
        stringr::str_sub(p_fmt,
                         end = 1L) %in% c("&#60;", ">") ~ paste0("p", p_fmt),
        TRUE ~ paste0("p=", p_fmt)
      )
  }
  attributes(p_fmt) <- attributes(unclass(x))
  return(p_fmt)
}
ddsjoberg commented 1 year ago

Thank you @leslem for the post!

Word output for gt is relatively new (this was once a bug for HTML). I would kindly ask that you post a minimal reproducible example in a GitHub Issue to the gt package. I think this is something that would quickly be addressed.

Thanks for giving an interim solution! 🍁

ddsjoberg commented 1 year ago

I forgot to mention an important point!

gtsummary supports output engines gt, flextable, huxtable, kableExtra, kable, and tibbles. Each of these engines often supports multiple output types (html, latex, etc), which is why I can't implement this solution from within gtsummary