IndrajeetPatil / ggstatsplot

Enhancing {ggplot2} plots with statistical analysis 📊📣
https://indrajeetpatil.github.io/ggstatsplot/
GNU General Public License v3.0
2.04k stars 190 forks source link

Different p-values with ggbetweenstats depending on included samples #778

Open MN2211 opened 2 years ago

MN2211 commented 2 years ago

Hello everyone,

I am facing the issue that I get p-values >0.05 for certain pairwise comparisons although the differences between them are bigger than between others which have p-values <0.05. However, when I exclude some of the samples, p-values change and I get significant differences between samples that were not significant before. The first graph attached shows no significant p-value between 7 and 5 and between 7 and 6 (although the difference between them is way bigger than between for example 1 and 5), but the second graph shows p <=0.05 for both. This is my script:

library(ggstatsplot)
library(tidyverse)
library(readxl)

data <- read_xlsx("path/file.xslx")
ggbetweenstats(data, sample, increase, type = "nonparametric", p.adjust.method = "none", pairwise.display = "significant")

and this is the data from the Excel sheet:

sample increase
1 8
1 8
1 8
2 8
2 8
2 8
3 8
3 8
3 8
4 8
4 8
4 8
5 7
5 6
5 6
6 6
6 6
6 6
7 1
7 1
7 1

Thank you in advance! MN

1 2

insectito commented 1 year ago

I am experiencing the same issue. u.u Did you get any solution?

MN2211 commented 1 year ago

As far as I remember, it was not a problem of the ggstats-code but is caused by the definition of the Kruskal-Wallis test. The significance depends on the included samples there. You might check, if you can switch to another (maybe even parametric) test.