carmonalab / UCell

Gene set scoring for single-cell data
GNU General Public License v3.0
135 stars 16 forks source link

UCell gene signature to predict celltype identity #20

Closed deevdevil88 closed 1 year ago

deevdevil88 commented 2 years ago

Hi, this is a great method for gene set scoring for single cell data and I have been using it a lot in my work. As the UCell score for any gene set size always ranges from 0 to 1, can i use multiple Ucell based gene signature scores to assign each cell a geneset identity, that is can i compare different UCell based Gene set scores for the same cell, and assign that cell a value as being enriched in one geneset compared to the others?

what would be the best way to do this? I dont necessarily need a pvalue but mostly a rough geneset assignment to each cell, so can i use lets say the raw Ucell score or lets say the Z-score of the Ucell score to assign each a cell a geneset identity?

Best, Devika

mass-a commented 2 years ago

Hello Devika, thanks for your interest in our tool.

Perhaps you can try out scGate, a tool based on UCell scores for filtering specific cell types from heterogeneous datasets. It allows you to construct simple gating models to assign cell type identities based on their UCell scores (smoothed by their neighboring cells).

We designed the method mostly to focus on one cell type of interest defined by a gating model; however, in the development version of scGate you will be able to specify a list of gating model for multiple cell types, and the program will assign multi-class labels for cells that can be unequivocally assigned to only one cell type. This is still somewhat experimental, but we'd be happy if you give it a try :)

Best -massimo

deevdevil88 commented 2 years ago

Hi @mass-a Thats awesome , many thanks for suggesting this. I will for sure give the development branch of scGate very soon. I had another question, if i have a batch corrected / integrated seurat object of multiple donors, and lets say i have done normalisation with SCT, as the different donors/samples will have different median sequencing depth, now the seurat developers have recommended a way to recorrect the UMI counts in the SCT assay using this method before doing DGE , as a way to integrae the different SCT models for the different samples together and have a corrected UUMI depth count matrix across al samples. the vignette for the function which does this is here: https://satijalab.org/seurat/reference/prepsctfindmarkers

In my opinion if i want to run UCell on an integrated object, it makes sense to do this recorrection of the umi counts and then run UCell. Do you agree or disagree?

Devika

mass-a commented 2 years ago

Hi Devika,

UCell scores are based only on per-cell gene rankings, so they should not be affected by the type of normalization you apply to the data (that is, if the normalization only alters the magnitude but not the ranking of genes in any given cell). This assumption holds with a simple log-normalization, but I am not sure if that is the case for the method you mentioned. You can always try to calculate UCell scores with or without this recorrection of counts and see if the score distribution stays the same?