dviraran / SingleR

SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
GNU General Public License v3.0
271 stars 98 forks source link

Possibility of using a single cell expression matrix as reference? #20

Closed qianhuiSenn closed 5 years ago

qianhuiSenn commented 5 years ago

Hello, thanks so much for this extremely helpful method for the annotation of the cell types. I am just wondering that if its possible to add a reference list from a single cell expression matrix (with annotated cell types), if so, is there any differences in constructing it from the bulk RNAseq reference in code?

Thanks so much for your time! Best Regards

dviraran commented 5 years ago

Thanks!

Definitely. If you want to use single-cell as input:

  1. First create pseudo-bulk gene expression profiles from the clusters (i.e. average expression of each cluster).
  2. If data is in counts, next normalize it to gene length (i.e. TPM/FPKM...). (You can use the TPM function in SingleR).
  3. Now follow the explanation for creating a new reference data set.

Let me know if something isn't clear or if there is any other problem.

Best, Dvir

qianhuiSenn commented 5 years ago

Thanks so much for the detailed clarification! I will give it a try!

GHAStVHenry commented 5 years ago

Perfect, I was just about to ask the same question... can the normalized data in a Seurat object be used?

dviraran commented 5 years ago

They are not normalized to gene length, so you still need to normalize them to gene-length and use them.

GHAStVHenry commented 5 years ago

...but if it's from 10x, that shouldn't be necessary because it's UMI based, no?

dviraran commented 5 years ago

Right...

qianhuiSenn commented 5 years ago

Hi, sorry to bother again, I am wondering if you could elaborate a bit more on how to construct the reference list? I am a bit confused by the feed-in data type for each of variable "name", "exp", "types", "main_types": For instance in a single cell data reference setting:

  1. is the variable "name" just carry a defined name for the created reference?
  2. is "exp" a matrix contains normalized expression data with row.name as gene and col.name as cluster id?
  3. is "types" another matrix or data frame contains column 1 as cluster id and column 2 as cell type? (similar for "main_types")

Is my understanding correct?

dviraran commented 5 years ago
  1. Yes.
  2. I don't think there is exp. Do you mean 'data'? If so, then yes. Col.name is not important.
  3. types is a vector of strings. types[i] is the cell type name of data[,i]. main_types is same as types, just broader definitions. You don't have to use main_types, and then use the flag do.main.types=F in the SingleR creation methods.

As examples, you can take a look at the available references objects, loaded with SingleR - hpca, blueprint_encode, immgen and mouse.rnaseq.

qianhuiSenn commented 5 years ago

Thanks, just a minor typo question, in the tutorial on the website:

# if using the sd method, we need to define an sd threshold
  sd = rowsSd(expr)

Do you mean rowSds() from the package genefilter?

dviraran commented 5 years ago

I use matrixStats, but I guess its the same. You can also use apply(expr,1,sd)... Anyhow, there is no reason to use 'sd' at all, its just part of the evolution of SingleR.

qianhuiSenn commented 5 years ago

Works like charm. Thanks so much, Dvir. I will close the issue.