carmonalab / UCell

Gene set scoring for single-cell data
GNU General Public License v3.0
135 stars 16 forks source link

How to interpret negative and zero UCell scores #4

Closed ksaunders73 closed 3 years ago

ksaunders73 commented 3 years ago

Hello!

I'm very new to statistics, so it may already be answered, but I just wanted to ask a clarifying question on how to interpret negative and zero UCell scores (do they represent genes being expressed less and equal to the mean respectively? Zero expression or negative expression?). I ran the equivalent of the following (though I did Z score transform it as well):

gene.sets <- list(c("CD2","CD3E","CD3D")
SeuratObject <- AddModuleScore_UCell(SeuratObject,features = gene.sets, name = "gene")
SeuratObject $signature_1gene <- (SeuratObject $signature_1gene - mean(SeuratObject $signature_1gene )) / sd(SeuratObject $signature_1gene )

Thanks for reading!

mass-a commented 3 years ago

Hello and thanks for using the tool.

By definition (check out the formula in the paper) UCell scores are bound between 0 and 1, so you won't have negative scores.

UCell scores are based on relative ranks, so they measure how the genes in your signature rank when compared to all other genes. The extreme cases would be that the three genes in your signature are the top three expressed genes (UCell score=1) or that all three have zero expression, in which case UCell score=0.

I hope this helps.

PS: you can more easily name signatures by assigning names to your list e.g.:

gene.sets <- list(mysignature1=c("CD2","CD3E","CD3D"),
                  mysignature2=c("CD19","BANK1","MS4A1"))
ksaunders73 commented 3 years ago

Hello @mass-a !

Thank you very much for replying and thank you as well for the great tool! I don't know if I am doing something wrong in terms of providing the genes or setting the assay wrong (I keep it at the RNA assay), or if the Z score transformation is affecting it somehow, but I do get negative scores in some of my UCell signatures following the structure of the code I provided in my question:

image

mass-a commented 3 years ago

Ah ok, I think I understand your question now.

If you transform to Z-scores you will get positive and negative values for any distribution of values (unless they are all equal).

In your example, negative Z-scores simply correspond to cells with UCell scores below the mean, for the given signature. And the Z-score value will tell you by how many standard deviation they deviate from the mean. But keep in mind you are comparing cells, not genes as you hinted in the original question.

ksaunders73 commented 3 years ago

Hello @mass-a ! That makes sense, thank you very much!