carmonalab / scGate

marker-based purification of cell types from single-cell RNA-seq datasets
91 stars 12 forks source link

Use of Custom Markers for scGate #32

Closed cwarden45 closed 2 months ago

cwarden45 commented 2 months ago

Hi,

Thank you very much for putting together this package.

I have applied scGate to 3 datasets: 1 human dataset, 1 Arabidopsis dataset, and 1 corn dataset.

For the human dataset, I think I need to modify the markers being used. However, I receive some type of result if I use some of the pre-defined signatures as follows:

models.db = get_scGateDB()
Seurat_object = scGate(data = Seurat_object, model = models.db$human$generic)
Seurat_object@meta.data$scGate_multi[is.na(Seurat_object@meta.data$scGate_multi)]="No Classification"

However, in contrast, for the plant datasets, I don't get any similar type of assignments. Either everything lacks an assignment (first example) or there are "Target" assignments without having a "is.pure_[cell type]" column for the 2 broad cell types (second example):

my_model = gating_model(name = "Columella", signature = c("AT1G26680","AT3G29810","AT3G55180","AT4G18350","AT5G22550","AT5G58580"))  
my_model = gating_model(model = my_model, level = 1, positive = FALSE,
                            name = "Quiescent_Center", signature = c("AT1G68640","AT2G03830","AT3G26120","AT5G17430","AT5G23780","AT5G62165"))
my_model = gating_model(model = my_model, level = 1, positive = FALSE,
                            name = "Endodermis", signature = c("AT1G05260","AT1G44970","AT1G61590","AT2G40160","AT2G48130","AT3G22620"))
my_model = gating_model(model = my_model, level = 1, positive = FALSE,
                            name = "Cortex", signature = c("AT1G09750","AT1G62510","AT3G12700","AT3G21670","AT3G26300","AT5G55250"))
my_model = gating_model(model = my_model, level = 1, positive = FALSE,
                            name = "Atrichoblast", signature = c("AT1G31950","AT1G76620","AT1G79840","AT4G00730","AT1G56100","AT5G66800"))
my_model = gating_model(model = my_model, level = 1, positive = FALSE,
                            name = "Trichoblast", signature = c("AT1G12560","AT1G48930","AT1G69240","AT4G00680","AT5G49270","AT5G65160"))
my_model = gating_model(model = my_model, level = 1, positive = FALSE,
                            name = "Xylem", signature = c("AT1G29950","AT3G08500","AT4G37650","AT5G08260","AT5G12870","AT5G15630"))

Seurat.combined = scGate(data = Seurat.combined, model = my_model)
Seurat.combined@meta.data$scGate_multi[is.na(Seurat.combined@meta.data$scGate_multi)]="No Classification"
my_model = gating_model(name = "Mesophyll", signature = c("Zm00001d046170","Zm00001d031899","Zm00001d044099"))  
my_model = gating_model(model = my_model, level = 1, positive = FALSE,
                            name = "Bundle_Sheath", signature = c("Zm00001d000316","Zm00001d052595"))

Seurat.combined = scGate(data = Seurat.combined, model = my_model)
Seurat.combined@meta.data$scGate_multi[is.na(Seurat.combined@meta.data$scGate_multi)]="No Classification"

The human dataset is truly one scRNA-Seq sample. The plant datasets represent 2-3 individual samples that have been merged/integrated.

Can you please help me troubleshoot? For example, is there a different way that I should be specifying new custom signatures?

Thank you again!

Sincerely, Charles

mass-a commented 2 months ago

Hello Charles, thanks for your message and interest in the tool.

I think there is some confusion on how to apply scGate to purify one target population vs. using it as a multi-cell type classifier. Apologies if we haven't documented this well enough.

Individual scGate models can only purify one target population at a time. For example, models.db$human$generic$Bcell will select B cells from the query dataset. The "generic" group of models is meant to be only used in this way. To use scGate as a multi-class classifier, you can apply the tool on a collection of several models (as e.g. in this tutorial), making sure that the models in the collection are mutually exclusive. That is, you cannot have both "T cell" and "CD8 T cell" in the same list of models, because CD8 T cells are also T cells and will labeled as positive for two classes by the method. In the first example, the "generic" models contains many such examples and will return a large amount of "multi" or NA because it cannot confidently assign most cells to only one cell type. Please use some of the other collections (e.g. the PBMC or TME collections) which contain models thought to be used together. We will add a note to clarify this.

In the plant datasets, you are creating one single model instead of a collection of models. This model will only purify the target population of this model, that is positive for the Columella signature and negative for all the other signatures. By your question I guess your goal is instead to have a multi-class classifier and assign groups of cells to each of these cell types? In that case, you would need to make a list of models, each containing one target population. Each model can be composed of a single signauture, or more complex combinations of signatures. You can look at existing model collections to understand the structure.

An example:

plant.models <- list()
plant.models$Columella <- gating_model(name = "Columella",
   signature = c("AT1G26680","AT3G29810","AT3G55180","AT4G18350","AT5G22550","AT5G58580"))
plant.models$Xylem <- gating_model(name = "Xylem",
   signature = c("AT1G29950","AT3G08500","AT4G37650","AT5G08260","AT5G12870","AT5G15630"))

Does it make sense?

cwarden45 commented 2 months ago

Thank you very much for your response.

1) Yes - I believe that I understand. For example. I noticed that there were "Male" and "Female" female markers, along with the cell type markers. I think my goal is to find cross-tissue assignments that can be made with one list of signatures. However, if starting from models.db$human$generic, then I have to find classifications that are mutually exclusive.

I might either trim those signatures provided by get_scGateDB()or define custom human signatures (similar to the plant signatures).

2) For the plant signatures, I have tested running the commands as you had described.

I originally tried to follow the instructions from the gating_model() documentation, adding models instead of removing a model. However, I had also noticed an option to provide a list of models, in other documentation. That said, you might have specified the list in a different way than I was considering, and I should have used the default of positive = TRUE instead of positive = FALSE. So, thank you very much for providing that information.

Thank you again!

Sincerely, Charles