carmonalab / UCell

Gene set scoring for single-cell data
GNU General Public License v3.0
137 stars 16 forks source link

What is a "significant" U Score (score cutoff indicating high expression of the signature)? #7

Closed ksaunders73 closed 3 years ago

ksaunders73 commented 3 years ago

Hello!

Just as a clarifying question, but what is considered to be a "significant" U score - or a score that indicates high expression of the signature? I read in https://carmonalab.github.io/UCell/UCell_vignette_TILstates.html , that suggests, whether with respect to that particular dataset, or in general, that a score greater than 0.2 meant high expression (see below):

"some clusters have a clearly high mean distribution of T cell signature scores (T.cell_UCell score > 0.2)."

Thank you for reading!

mass-a commented 3 years ago

Hello! qualitatively, if a cluster of cells shows scores that are consistently higher than zero, then you probably have a signal that will allow you to distinguish between cell types. The actual UCell score value will depend on the quality of the signature, and of course on the quality of the data. This is similar to asking how many UMI counts you need to observe to say that a gene is highly expressed: it depends. If you want a more quantitative measure for 'significant' UCell scores, you can draw the distribution of these scores within a given cluster of cells. By comparing score distributions between clusters you can quantify if and how much they differ, and calculate measures of significance.

ksaunders73 commented 3 years ago

That makes sense, thank you very much @mass-a !

ATpoint commented 1 year ago

@mass-a Hi, following up on this: Do you have a take on whether the scores are suitable for a pairwise comparison between clusters? Say you score for a number of pathways and then aim to compare "pathway activity" using the scores, and be it with something as simple as the Wilcox test. Does that in your opinion make sense, and if so, any take on what the "fold change" between scores should at least be to be meaningful? I can open a new issue if this does not fit here. Thanks for your time!

mass-a commented 1 year ago

Hello Alexander, I think it's fine to perform a statistical test for UCell score distributions between clusters. You could convince yourself of this by performing e.g. a Wilcoxon test between UCell scores on a random partition of the data (as opposed to a meaningful clustering) - in this case the test should not be significant.

That said, with single cell data one often has a large N of data points, so even small differences can result in highly significant p-values. As you suggested, I think it's a good idea to also report the effect size, for example in terms of fold change of the mean UCell score, together with the p-value.

I hope this helps -m

edridgedsouza commented 7 months ago

That said, with single cell data one often has a large N of data points, so even small differences can result in highly significant p-values.

I am encountering this issue currently. Comparing a certain cell cycle signature between groups, I can see visually that the IQRs of my experimental conditions largely overlap, but because n for each group is several tens of thousands, any sort of comparison I perform will end up showing extremely small p values. This holds true with T test, Wilcox, K-S, and ANOVA. When you examine the Cohen D, it generally gives values in the 0.2-0.4 range despite the considerable visual overlap of both IQR and mean+/-sd intervals.

What is the suggested method when one is trying to demonstrate for QC purposes that scores are not meaningfully different between groups? Is there validity in approaches that use downsampling? Or for transparency, should the results of all tests be reported as-is, with interpretation in the text about why significant differences with low effect sizes are tolerable? For the time being, I have been simply reporting the estimated intervals for mu1-mu2 (for t test) and location parameter (for wilcox), with the conclusion being that these intervals are tiny compared to the range of the sample distributions.

j-andrews7 commented 5 months ago

That said, with single cell data one often has a large N of data points, so even small differences can result in highly significant p-values.

I am encountering this issue currently. Comparing a certain cell cycle signature between groups, I can see visually that the IQRs of my experimental conditions largely overlap, but because n for each group is several tens of thousands, any sort of comparison I perform will end up showing extremely small p values. This holds true with T test, Wilcox, K-S, and ANOVA. When you examine the Cohen D, it generally gives values in the 0.2-0.4 range despite the considerable visual overlap of both IQR and mean+/-sd intervals.

What is the suggested method when one is trying to demonstrate for QC purposes that scores are not meaningfully different between groups? Is there validity in approaches that use downsampling? Or for transparency, should the results of all tests be reported as-is, with interpretation in the text about why significant differences with low effect sizes are tolerable? For the time being, I have been simply reporting the estimated intervals for mu1-mu2 (for t test) and location parameter (for wilcox), with the conclusion being that these intervals are tiny compared to the range of the sample distributions.

I tend to think of this similarly as I would for typical DE, which is also subject to p-value deflation due to large N. Pseudobulking the scores by cluster or cell type per group, then performing the testing on those values takes care of that. Of course, this requires some number of replicate samples, but tends to return significant p-values only for comparisons with reasonable effect sizes.