JiaLiVUMC / scMRMA

13 stars 4 forks source link

scMRMA: single cell Multi-Resolution Marker-based Annotation Algorithm

Installation

scMRMA R package can be easily installed from Github using devtools:

devtools::install_github("JiaLiVUMC/scMRMA")

some users might have issues when installing scMRMA package due to the version of C++, please check possible solution through this website

Example

After installing scMRMA, use following codes to run example:

# Note: example will take two minutes.

library(scMRMA)
load(system.file("data", "MouseBrain.Rdata", package = "scMRMA"))
result <- scMRMA(input = Brain_GSM3580745,
                 species = "Mm",
                 db = "panglaodb",
                 p = 0.05,
                 normalizedData = F,
                 selfDB = NULL,
                 selfClusters = NULL,
                 k=20)

input Count matrix with genes in row and cells in column. Formats of matrix, dgCMatrix, data.frame, Seurat and SingleCellExperiment object are all acceptable.

species Species of cell. Select "Hs" (default) or "Mm".

db Hierarchical cell type reference database. Select "panglaodb" (default) or "TcellAI".

p P value cutoff from fisher test for the significant cell type enrichment. Default is 0.05.

normalizedData Use user-provided normalized data. Default is F to use default method for normalization.

selfDB Use user-provided or modified hierarchical cell type database.

selfClusters Use fixed clusters in each level. If provided cluster information, re-clustering will not be performed for intermediate nodes.

k Number of nearest neighbor to build the graph for clustering. Dafault if 20. The value can be set smaller for very rare and small clusters.

Output

result A list includes annotation results based on multi-resolution and uniform-resolution.

result$multiR$annotationResult A data frame stores scMRMA annotation results for each cell in all reference levels. For example, totally four levels for database panglaodb.

result$multiR$meta A data frame contains scMRMA cluster, celltype activity score and p value information for each cell in each level.

result$uniformR$annotationResult A data frame stores uniform-resolution annotation results for each cell.

result$uniformR$meta A data frame contains uniform-resolution cluster, celltype activity score and p value information for each cell in each level.

Other functions

Self-defined database

# Note: please provide correct format of hierarchical database
# By default, cell types and genes are separeted by comma without space
# >CD4 T cells,T cells #leaf celltype,root celltype
# CD4,FOXP3,IL2RA,IL7R #GeneA,GeneB,GeneC,GeneD

CellType <- selfDefinedDatabase(file = system.file("data", "markerExample.txt", package = "scMRMA"))

Use pre-trained classifier from Garnett

# Note: pre-trained human PBMC Garnett classifier

hsPBMC <- selfDefinedDatabase(file = system.file("data", "Garnett_hsPBMC.txt", package = "scMRMA"))

Add genes to existing database

# Note: provide the correct format for gene and cell type list. First column includes genes and second column includes cell types in the last level.

genelist <- matrix(c("Genea","Geneb","Tr1","Microglia"),nrow = 2,byrow = F)
colnames(genelist) <- c("Gene","cellType")
CellType_new <- addGene(geneCellTypeList = genelist,celltype = CellType)

Hierarchical database visualization

# Note: it will generate a Database.html file within your current path.

CellType <- get_celltype(species="Hs",db="TcellAI")
databaseVisual(celltype = CellType)

Incorporate with Seurat

library(scMRMA)
load(system.file("data", "Brain_GSM3580745.Rdata", package = "scMRMA"))

# Create Seurat object
library(Seurat)
brain <- CreateSeuratObject(Brain_GSM3580745)
brain <- NormalizeData(brain, verbose=FALSE)
brain <- FindVariableFeatures(brain, selection.method = "vst", nfeatures = 2000, verbose=FALSE)
brain <- ScaleData(brain,features =VariableFeatures(brain), verbose=FALSE)
brain <- RunPCA(brain,features = VariableFeatures(brain), npcs = 50,verbose=FALSE)
brain <- RunUMAP(brain, reduction = "pca", dims = 1:50, verbose=FALSE)

# scMRMA annotation
result <-scMRMA(input=brain, species="Mm")

# UMAP plot
brain[["scMRMA"]] <- result$multiR$annotationResult[colnames(brain),ncol(result$multiR$annotationResult)]
DimPlot(brain,reduction = "umap",group.by = "scMRMA",label = TRUE,repel = TRUE)

# Use user-provided cluster information
# Note: cluster information should be provided as factor

brain <- FindNeighbors(brain,verbose = F)
brain <- FindClusters(brain,resolution = 0.5,verbose=F)
result <- scMRMA(input=brain, species="Mm",selfClusters=Idents(brain))

Citation

Li, Jia, et al. "scMRMA: single cell multiresolution marker-based annotation." Nucleic acids research 50.2 (2022): e7-e7.