IndrajeetPatil / ggstatsplot

Enhancing {ggplot2} plots with statistical analysis 📊📣
https://indrajeetpatil.github.io/ggstatsplot/
GNU General Public License v3.0
1.98k stars 184 forks source link

Inconsistence in W test statistic #951

Open maximelepetit opened 2 months ago

maximelepetit commented 2 months ago

I would like to thank you for this very interesting package.

I need help with interpretation and clarifying certain values.

I calculated an apoptosis score for two cell samples. BASE cells and LPS cells. And I would like to see if there is a significant statistical difference between the 2 groups. For group 1 the sample size is 8126 cells and for group 2 the sample size is 7942 cells.

Naively I did a Wilcoxon test between these two groups.

# Extract data
apoptosis_data <- FetchData(neurons_v5_cb_subset_neurons_silvia, vars = c("ApoptosisScore1", "orig.ident"))
rownames(apoptosis_data)<-NULL
head(apoptosis_data)
  ApoptosisScore1     orig.ident
           <dbl>            <chr>
1   0.04673351  BASE        
2   0.03632951  BASE        
3   0.05176500  BASE        
4   0.04276227  BASE        
5   0.03331517  BASE        
6   0.03697204  BASE
group1 <- apoptosis_data[apoptosis_data$orig.ident == "LPS", "ApoptosisScore1"]
length(group1)
8126

and

group2 <- apoptosis_data[apoptosis_data$orig.ident == "BASE", "ApoptosisScore1"]
length(group2)
7942

Perform wilcoxon rank sum test :

wilcox_test <- wilcox.test(group1, group2)
print(wilcox_test)

That give :

    Wilcoxon rank sum test with continuity correction

data:  group1 and group2
W = 42939709, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

Conclusion : The p-value < 2.2e-16 suggests that there is a statistically significant difference in the ApoptosisScores between the two groups. Therefore, you can reject the null hypothesis that the distributions of the ApoptosisScores in the two groups are the same.

Then I discovered the ggstatsplot package.

After reading the documentation I decided to use ggbetweenstats function between the two groups. According to the documentation : Non-parametric 2 Mann-Whitney U test [stats::wilcox.test()](https://rdrr.io/r/stats/wilcox.test.html) I decided to set type="nonparametric" in order to find the value of p.value obtained previously.

Here the code used :

p <- ggbetweenstats(
  data  = apoptosis_data,
  x     = orig.ident,
  y     = ApoptosisScore1,
  type = "nonparametric",
  ylab = "Apoptosis score",
  xlab = "Condition",
  title = "Distribution of Apoptosis Score across condition"
) 

Give : comparaison_lps_base_withoutggsignif

I am wondering why the test statistic (W) is different when i ran wilcoxon.test in one hand (W = 42939709) and the test statistic gave on the plot : 2.16e+07 ?

I need help !

Thanks.

Maxime

IndrajeetPatil commented 2 months ago

It's hard for me to look into this without a reproducible example.

maximelepetit commented 2 months ago

Here, The code and the data ;) issue_951_ggstatplot.zip