Different p-values with ggbetweenstats depending on included samples

MN2211 commented 2 years ago

Hello everyone,

I am facing the issue that I get p-values >0.05 for certain pairwise comparisons although the differences between them are bigger than between others which have p-values <0.05. However, when I exclude some of the samples, p-values change and I get significant differences between samples that were not significant before. The first graph attached shows no significant p-value between 7 and 5 and between 7 and 6 (although the difference between them is way bigger than between for example 1 and 5), but the second graph shows p <=0.05 for both. This is my script:

library(ggstatsplot)
library(tidyverse)
library(readxl)

data <- read_xlsx("path/file.xslx")
ggbetweenstats(data, sample, increase, type = "nonparametric", p.adjust.method = "none", pairwise.display = "significant")

and this is the data from the Excel sheet:

sample	increase
1	8
1	8
1	8
2	8
2	8
2	8
3	8
3	8
3	8
4	8
4	8
4	8
5	7
5	6
5	6
6	6
6	6
6	6
7	1
7	1
7	1

Thank you in advance! MN

insectito commented 1 year ago

I am experiencing the same issue. u.u Did you get any solution?

MN2211 commented 1 year ago

As far as I remember, it was not a problem of the ggstats-code but is caused by the definition of the Kruskal-Wallis test. The significance depends on the included samples there. You might check, if you can switch to another (maybe even parametric) test.

IndrajeetPatil / ggstatsplot

Different p-values with ggbetweenstats depending on included samples #778