IndrajeetPatil / ggstatsplot

Enhancing {ggplot2} plots with statistical analysis 📊📣
https://indrajeetpatil.github.io/ggstatsplot/
GNU General Public License v3.0
2.03k stars 187 forks source link

"ratio" argument in ggpiestats/ggbarstats seems dysfunctional #818

Closed bodhisat closed 3 months ago

bodhisat commented 1 year ago

When I specify the "ratio" argument in "ggbarstats()" or "ggpiestats()", the test results seem wrong. However, manual application of the "contingency_table()" function from the "statsExpressions" package to a grouped tibble seems to give the right output.

I inspected the definition of "ggbarstats()" and I fail to understand why the results differ; see the example below.

library(ggstatsplot); library(statsExpressions); library(tidyverse); library(reprex)
#> You can cite this package as:
#>      Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
#>      Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167

data <- data.frame(
  x = factor(c('low', 'high', 'low', 'low')),
  type = factor(c(0, 0, 1, 1))
)

data %>%
  group_by(type) %>%
  group_modify(~ contingency_table(.x, x,
                                   ratio = c(.001, .999))) %>%
  ungroup() %>% suppressWarnings
#> # A tibble: 2 × 14
#>   type  statistic    df   p.value method effec…¹ estim…² conf.…³ conf.…⁴ conf.…⁵
#>   <fct>     <dbl> <dbl>     <dbl> <chr>  <chr>     <dbl>   <dbl>   <dbl>   <dbl>
#> 1 0     499.          1 2.01e-110 Chi-s… Pearso…  0.998     0.95   0.998       1
#> 2 1       0.00200     1 9.64e-  1 Chi-s… Pearso…  0.0316    0.95   0           1
#> # … with 4 more variables: conf.method <chr>, conf.distribution <chr>,
#> #   n.obs <int>, expression <list>, and abbreviated variable names ¹​effectsize,
#> #   ²​estimate, ³​conf.level, ⁴​conf.low, ⁵​conf.high

extract_stats(ggpiestats(data, x = x, y = type,
                         ratio = c(.001, .999)))$one_sample_data
#> # A tibble: 2 × 10
#>   type  counts  perc N       statistic    df p.value method       .label .p.la…¹
#>   <fct>  <int> <dbl> <chr>       <dbl> <dbl>   <dbl> <chr>        <chr>  <chr>  
#> 1 1          2    50 (n = 2)         2     1   0.157 Chi-squared… list(… list(~…
#> 2 0          2    50 (n = 2)         0     1   1     Chi-squared… list(… list(~…
#> # … with abbreviated variable name ¹​.p.label

Created on 2022-11-28 with reprex v2.0.2

lucolotto commented 1 year ago

I am adding my comment to this post cause I have found the same thing. The parameter ratio doesn't seem to work when I run ggbarstat or ggpiestat and specify also the y argument. Is it normal? E.g.:

ggpiestats(
  data = Titanic_full,
  x = Survived,
  y = Sex,
  ratio = c(.73, .27)
)

ggpiestats(
  data = Titanic_full,
  x = Survived,
  y = Sex,
  ratio = c(.27, .73)
)

If I run the two lines of code, the results won't change. Is there a way to fix this?

Thanks