dviraran / SingleR

SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
GNU General Public License v3.0
266 stars 98 forks source link

Retrieve reference genome and return no match #31

Open biansr opened 5 years ago

biansr commented 5 years ago

Hello, there're two questions that I'm wondering whether you can help answer.

First, I want to replace "DC: monocyte-derived" of the reference genome with my own bulk RNA-Seq data of tissue-resident DC, but keep all other cell categories. I wonder whether there's a function in your package that will help me retrieve the reference matrix that SingleR use by default (for human), so that I can append my own gene expression data.

Secondly, I noticed that for cell types that are not exist in the reference genome(for example mast cell in one of my samples), SingleR will still do the iterations and return cell types with less correlation values. Is it possible for singleR to return "no cell match" when the correlation is below certain values? So that novel cell types will not be assigned with mandatory types.

dviraran commented 5 years ago

Hi,

  1. The references are all loaded with the loading of the library. "DC: monocyte-derived" comes from the HPCA reference, which is microarray-based (see the hpca object). You should try the Blueprint+ENCODE reference which is RNA-seq based (the object is called blueprint_encode). You can change it as you want, and you will need to use the function CreateVariableGeneSet to recreate the pairwise differential genes. See the part about Create a new reference data set in this tutorial.

  2. There are two methods built-in that you can use. You can use a threshold for the scores - see the documentation for SingleR.PlotTsne (score.thres field). The problem is that the scores are strongly associated with the number non-zero genes (see 'SingleR score is association with the number of non-zero genes' in the supp info. Another option is to use the outlier p-values. This is also discussed in this supp info file.

The truth is that both methods are far from perfect. Its not straight-forward how to identify that the cell type is not in the reference. Thus, it requires some manual corroboration of the annotations.

Best, Dvir