dviraran / SingleR

SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
GNU General Public License v3.0
261 stars 95 forks source link

Single cells' CellType prediction (Signature genes for the cell types) #11

Open KeshavB02 opened 5 years ago

KeshavB02 commented 5 years ago

Hi, I am working on the Single cell analysis using Seurat. I am new to the SingleR, It is really very useful for the single cell level cell type prediction. I have read the SingleR documentation, but still little confused that how a cell is categorized by a cell type. It is written that it uses correlation of the cells with the reference sample. Are signatures genes available for all of the cell types(at single cell level) ? I want to understand more about the score calculation for the cell types. It will be really helpful for the analysis. Thanks

KeshavB02 commented 5 years ago

According to the SingleR documentation, I understood that it's checking the correlation of the sample cells with the reference pure cell types. After fine tuning, the highest correlated reference cell type is declared as cell type for the sample cell. Am I making sense ? Can I get some markers genes or top DE genes from HPCA or Blueprint_Encode for the cell types ?

dviraran commented 5 years ago

Hi,

Yes, you are understanding it correctly. All details about the SingleR annotation pipeline are available here. If something is still unclear I would be happy to clarify it.

Regarding marker genes - well, you can always use the reference datasets to find genes that are specific to your cell type of interest. The reference datasets are loaded with SingleR (look at the blueprint_encode, hpca, immgen and mouse.rnaseq objects). A more direct approach might be to run a single-cell DE analysis using the SingleR annotations.

Hope this helps.

Best, Dvir

KeshavB02 commented 5 years ago

Thanks for the clarification. I am wondering that, for the pure CD8+ T-cells, SingleR (HPCA) is predicting a lot of cells as CD4+ T-cells. I have checked for the other data sets also, number of CD4+ cells are higher than the expected. Have you encountered such gaps in cell type annotations in SingleR ? If so, how should we proceed with the analysis. Thanks

dviraran commented 5 years ago

Hi,

  1. I agree that HPCA is usually not a good reference. For Human, I usually use the Blueprint+ENCODE reference.
  2. are you looking at the fine-tuned SingleR annotations? Annotating non-CD4 T-cells as CD4+ T-cells is a known problem using correlations - see the original 10X paper (Zheng et al. Nat. Comm. 2017, and also my comparisons to the RCA method) where the top correlation for almost all cells was CD4+ T-cells. However, in my analyses, the fine-tuning step really helped overcome this issue. Looking back at my analyses, 93.4% of the 10X sorted CD8+ T-cells were annotated as CD8+ T-cells after fine-tuning (when looking at main types, 83.4% when looking at all types). Without fine-tuning, this number drops to 39%.

Best, Dvir

kupadhya2 commented 5 years ago

Hi,

How do you use Blueprint+ENCODE in human instead HPCA? I tried ref.list = list(blueprint_encode), but it still showing " Annotating data with HPCA . . .".

Thanks

dviraran commented 5 years ago

When you run the regular pipeline, it runs both. The results are in singler$singler[[2]] (instead of [[1]]).

However - this should have worked. Which function are you using?

kupadhya2 commented 5 years ago

So, you mean to say singler$singler[[1]] has HPCA annotation and singler$singler[[2]] has blueprint_encode annotation?

I am using CreateBigSingleRObject

dviraran commented 5 years ago

yes.

Ok, that makes sense, I see where it is not passed along in this function. I'll fix this issue next time I deploy a version.

kupadhya2 commented 5 years ago

Thanks

anyone1985 commented 5 years ago

Hi dviraran, I use the function mentioned the supplementary information. singler = CreateSinglerObject(counts_matrix, annot = NULL, project.name, min.genes = 0, technology = "10X", species = "Human", citation = "", ref.list = list(), normalize.gene.length = F, variable.genes = "de", fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T, reduce.file.size = T, numCores = 12)

I notice many cells were classified as 'CD4+ Tem' and the main type were annotated as 'CD8+ T-cells' (988/1505) in the singler$singler[[2]]. How should I do with the inconsistent between cell type and main type? Thank you for your help.

dviraran commented 5 years ago

In my experience, the non-'main types' mode is more reliable. However, I would search for more evidence regarding the annotation. You can examine the scores heatmap (SingleR.DrawHeatmap function), and also examine markers (as you would do regularly).

Hope this helps.

Best, Dvir