IndrajeetPatil / statsExpressions

Tidy data frames and expressions with statistical summaries 📜
https://indrajeetpatil.github.io/statsExpressions/
Other
312 stars 20 forks source link

feature request: reporting results in `rmarkdown` #1

Open IndrajeetPatil opened 5 years ago

IndrajeetPatil commented 5 years ago

I would like to know if there's a way to use the APA-formatted text of the statistical results prepared with the subtitle helper functions in a plain RMarkdown file (probably with inline R code?)

For example-

The correlation analysis showed that `x` and `y` were correlated:
 `r corr_test(data, x, y)`.

Also see: https://easystats.github.io/report/

IndrajeetPatil commented 5 years ago

This is a starting point for whatever the ultimate solution might look like. The final form of the function will need to cover all kinds of objects that are going to pop up in the calls containing plotmath characters.


format_markdown <- function(expr) {
  # in plotmath, paste acts like paste0
  paste <- paste0

  # italic text just has stars around it
  italic <- function(s) paste0("*", s, "*")

  # single subscripts are entered using subsetting
  `[` <- function(main, subscript) paste0(main, "~", subscript, "~")

  # evaluate the expression to produce a string
  eval(expr)
}

# works
format_markdown(
  ggstatsplot::gghistostats(
    data = iris,
    x = Sepal.Length,
    test.value = 7,
    output = "subtitle"
  )
)

#> [1] "*t*(149) = -17.11, *p* = < 0.001, *g* = -1.39, CI~95%~ [-1.63, -1.17], *n* = 150"

format_markdown(
  ggstatsplot::ggdotplotstats(
    data = iris,
    y = Species,
    x = Sepal.Length,
    type = "np",
    output = "subtitle"
  )
)

#> Warning in wilcox.test.default(x = data$x, alternative = "two.sided",
#> na.action = na.omit, : requested conf.level not achievable

#> Error in paste(NULL, "log"["e"](italic("V")), " = ", "1.79", ", ", italic("p"), : attempt to apply non-function

Created on 2019-06-17 by the reprex package (v0.3.0)

friedeh commented 5 years ago

Dear Indrajeet, that looks like a very good starting point and also works perfectly for me as an inline code. It is especially valuable if you want to transfer the analyses from an EDA into an abstract (for the results-section). Best regards Hendrik

matcasti commented 3 years ago

Yes please, this would be awesome for report writing. I've check the code inside some of your packages and it seems that you could pull it out smoothly. Best regards Matías

matcasti commented 2 years ago

Working from the shoulders of the previous example code, this function maybe can accomplish the task related to #104

format_markdown <- function(expr) {

  # trasnform expression to be able to modify it
  expr <- as.list(x = as.list(expr)[[1]])

  # replace invalid patterns to be evaluated
  expr <- gsub(")[", ") * \"\"[", expr, fixed = TRUE)
  expr <- gsub("](", "] * list2(", expr, fixed = TRUE)

  # transform to again to expression to evaluate it
  expr <- lapply(expr, str2lang)
  expr <- as.call(expr)

  # global variables; could be changed for anything else
  p <- "p"; CI <- "CI"; chi <- "*X*";
  mu <- "*mu*"; log <- "log"; BF <- "BF";
  e <- "e"; epsilon <- "Epsilon"; R <- "R";
  HDI <- "HDI"; xi <- "xi"; omega <- "Omega";

  # list works pasting expressions
  list <- function(...) paste(..., sep = ", ")

  # `list2` works the same but only for expressions with >= 1 parameter(s)
  list2 <- function(...) paste0("(", paste(..., sep = ", "), ")")

  # italic text just has stars around it
  italic <- function(s) paste0("*", s, "*")

  # wide hat has no effect on final output
  widehat <- function(s) as.character(s)

  # single subscripts are entered using subsetting
  `[` <- function(main, subscript) paste0(main, "~", subscript, "~")

  # single superscript are entered using symbol in both sides
  `^` <- function(main, superscript) paste0(main, "^", superscript, "^")

  # this symbol will concatenate
  `*` <- function(lhs, rhs) paste0(lhs, rhs, collapse = ", ")

  # replace to equal sign
  `==` <- function(lhs, rhs) paste0(lhs, " = ", rhs)

  # suppress formula behaviour
  `~` <- function(lhs, rhs) paste0(lhs, rhs)

  # evaluate the expression to produce a string
  eval(expr)
}

With this, a got the next outputs:

gghistostats

format_markdown(
  ggstatsplot::gghistostats(
  data = iris,
  x = Sepal.Length,
  test.value = 7,
  output = "subtitle"
  )
)
#> [1] "*t*~Student~(149) = -17.11, *p* = 5.56e-37, *g*~Hedges~ = -1.39, CI~95%~[-1.62, -1.17], *n*~obs~ = 150"

ggdotplotstats

format_markdown(
  ggstatsplot::ggdotplotstats(
    data = iris,
    y = Species,
    x = Sepal.Length,
    type = "np",
    output = "subtitle"
  )
)
#> [1] "*V*~Wilcoxon~ = 6.00, *p* = 0.25, *r*~biserial~^rank^ = 1.00, CI~95%~[1.00, 1.00], *n*~obs~ = 3"

ggbetweenstats

format_markdown(
  ggstatsplot::ggbetweenstats(
    data = iris,
    x = Species,
    y = Sepal.Length,
    type = "b",
    output = "subtitle"
  )
)
#> [1] "log~e~BF~01~ = -65.10, *R^2^*~Bayesian~^posterior^ = 0.61, CI~95%~^HDI^[0.54, 0.67], *r*~Cauchy~^JZS^ = 0.71"
format_markdown(
  ggstatsplot::ggbetweenstats(
    data = iris,
    x = Species,
    y = Sepal.Length,
    type = "r",
    output = "subtitle"
  )
)
#> [1] "*F*~trimmed-means~(2, 53.84) = 111.95, *p* = 0.00, xi = 0.85, CI~95%~[0.77, 0.91], *n*~obs~ = 150"
format_markdown(
  ggstatsplot::ggbetweenstats(
    data = iris,
    x = Species,
    y = Sepal.Length,
    type = "p",
    output = "subtitle"
  )
)
#> [1] "*F*~Welch~(2, 92.21) = 138.91, *p* = 1.51e-28, Omega~p~^2^ = 0.74, CI~95%~[0.67, 1.00], *n*~obs~ = 150"
matcasti commented 2 years ago

I have prepared a Pull Request with minor modifications to the original code to avoid R-CMD-check notes, as well as to work with the output from {ggstatsplot} and that of {statsExpressions} functions. I hope this helps!

roaldarbol commented 3 months ago

First of all, thanks for developing these amazing packages!

This feature would still be immensely helpful to get a solution to! Most results section in journal papers requires results to be written within the text, and this could provide a perfect way to have reproducible results section, as the stats would update in case the underlying data does. IMO this should be a quite high priority. :-) Most Markdown formats understand TeX $ math syntax these days, so I think that's probably the best way to go. I see two potential ways to go about this:

  1. Make a plotmath-to-TeX converter which outputs an expression inside $s.
  2. Re-factor the code to generate TeX by default, and then convert to plotmath for the graphs (there exists a converter for that: latex2exp).

As an aside, also note the new {marquee} which enables seamless Markdown rendering within ggplots. They do not yet support LaTeX equations, but Thomas seems to be on it (https://github.com/r-lib/marquee/issues/2). So once that works, outputting the TeX expressions in and rendering them in plots with {marquee} might be the least work in the long run and probably the most extensible way.

IndrajeetPatil commented 3 months ago

@roaldarbol It's uncanny that you wrote this today, since it's only yesterday I subscribed to the {marquee} issue you mention here 👀

Thanks for the detailed heads-up, though. This is definitely something I am going to follow closely, and hopefully it becomes easier to support LaTeX equations in marquee, which would also make it easier to support them in {statsExpressions}.

roaldarbol commented 3 months ago

Haha, uncanny timing indeed! Fingers crossed they manage to enable it soon! :-D