Statistics for seurat VlnPlot of UCell scores

carmonalab / UCell

Gene set scoring for single-cell data

GNU General Public License v3.0

132 stars 16 forks source link

Statistics for seurat VlnPlot of UCell scores #21

Closed brookeabrown closed 1 year ago

brookeabrown commented 1 year ago

Hello,

Thanks for a great package. Using Seurat objects, I have calculated UCell scores and plotted as violin plots as in the vignette, https://carmonalab.github.io/UCell_demo/UCell_Seurat_vignette.html. Is there a way to calculate a p value through Wilcoxon or ANOVA for the Seurat violin plots of the UCell scores?

Any help is greatly appreciated!

mass-a commented 1 year ago

Hello Brooke,

it depends a bit on the assumptions and the kind of comparison you want to make. For example, you may want to compare the same signature across clusters, or compare different signatures on the same set of cells. You can always extract the distributions of UCell scores from metadata and perform your statistical tests, e.g.:

a <- object$sig1_UCell
b <- object$sig2_UCell
wilcox.test(a, b, paired = TRUE, alternative = "two.sided")

There are several tools implemented in R to perform statistical tests and add them to your plots. One that I like is ggpubr – here is a tutorial for adding p-values directly on violin or boxplots: http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/

I hope this helps! -m

brookeabrown commented 1 year ago

Thank you for your response! That worked well.

ggpubr worked great. I understand this is a separate package, but wanted to ask just in case you have dealt with this, do you know how to have it report the actual p value and not "p < 2.2e-16" for low p-values? I have looked into it but haven't found a solution yet.

mass-a commented 1 year ago

I think it's because the printing functions are using by default the smallest floating point number in R:

.Machine$double.eps

2.220446e-16

I am not sure if this behavior can be altered within ggpubr, but perhaps you can bypass it by not using the default printing function, e.g.:

a <- 1:100
b <- 200:300
test <- wilcox.test(a, b)

test

    Wilcoxon rank sum test with continuity correction
data:  a and b
W = 0, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

But:

test$p.value

[1] 1.759753e-34