dviraran / SingleR

SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
GNU General Public License v3.0
266 stars 98 forks source link

Regress out covariates #49

Open ishanparanjpe opened 5 years ago

ishanparanjpe commented 5 years ago

Hi Dvir, Was wondering how to regress out covariates other than nUMI. My code is as below. When I try to regress out orig.ident, there is an error.

A = sample(1:ncol(combined$sc.data), 2000) annot=data.frame(orig.ident = combined$orig.ident[A],cancer_type =cancer_type[A] )

names(annot) = rownames(data.combined@meta.data)[A]

singler = CreateSinglerSeuratObject(combined$sc.data[,rownames(annot)], annot = annot[,1], project.name="test", min.genes = 200, technology = "10X", species = "Human", citation = "", ref.list = list(), normalize.gene.length = F, variable.genes = "de", fine.tune = T, reduce.file.size = T, do.signatures = T, min.cells = 2, npca = 10, regress.out = "orig.ident", do.main.types = T, reduce.seurat.object = T, numCores = SingleR.numCores)

Regressing out: orig.ident | | 0%Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

dviraran commented 5 years ago

Hi,

The regress.out parameter is just passed down to Seurat. I am guessing you are trying to regress out batch effects. I would suggest to not use the CreateSinglerSeuratObject function. First create a Seurat object, where you can regress out the batches (you will need to use the AddMetaData function). Afterwards, you can use the CreateSinglerObject function to get the SingleR annotations. See case 2 here.

Best, Dvir

ishanparanjpe commented 5 years ago

If I have multiple samples coming from two groups which I eventually want to compare, is it best to first merge the groups and then run singleR or run singleR individually on each group?

dviraran commented 5 years ago

For SingleR it doesn't matter, you can do it before or after, since the annotation is per cell. But, it is definitely much simpler to first create your single-cell dataset with the clusters and dimensions and use that as input for SingleR. See case 2 for more information.

NTNguyen13 commented 5 years ago

Hi, I'm having the same question.

Case 2:

singler = CreateSinglerObject(counts, annot = NULL, project.name, min.genes = 0,
  technology = "10X", species = "Human", citation = "",
  ref.list = list(), normalize.gene.length = F, variable.genes = "de",
  fine.tune = T, do.signatures = T, clusters = NULL, do.main.types = T, 
  reduce.file.size = T, numCores = SingleR.numCores)

singler$seurat = seurat.object # (optional)
singler$meta.data$orig.ident = seurat.object@meta.data$orig.ident # the original identities, if not supplied in 'annot'

## if using Seurat v3.0 and over use:
singler$meta.data$xy = seurat.object@reductions$tsne@cell.embeddings # the tSNE coordinates
singler$meta.data$clusters = seurat.object@active.ident # the Seurat clusters (if 'clusters' not provided)

As I understand in case 2, I'm passing down the count value from seurat object to SingleR to compute, but as you said, it's better to create the seurat object and regress out first, then use the SingleR, so how can I pass the scale.data from seurat object to SingleR? Thank you!

dviraran commented 5 years ago

No, don't use the scale.data. That is a really bad idea. Use the normalized and filtered data, but not the scaled data (see the red warning in the html for creating objects).

Just do something like: counts = seurat.object@data (for Seurat v2) counts = seurat.object@assays$RNA@data (for Seurat v3)

NTNguyen13 commented 5 years ago

I see, thank you very much