Install: You can install the scLearn package from Github using devtools packages with R>=3.6.1.
library(devtools)
library(SingleCellExperiment)
library(M3Drop)
install_github("bm2-lab/scLearn")
For illustration purpose, we took the dataset baron-human.rds and xin-human.rds as examples.
Data preprocessing:
# loading the reference dataset
data<-readRDS('baron-human.rds')
rawcounts<-assays(data)[[1]]
refe_ann<-as.character(data$cell_type1)
names(refe_ann)<-colnames(data)
# cell quality control and rare cell type filtered and feature selection
data_qc<-Cell_qc(rawcounts,refe_ann,species="Hs")
data_type_filtered<-Cell_type_filter(data_qc$expression_profile,data_qc$sample_information_cellType,min_cell_number = 10)
high_varGene_names <- Feature_selection_M3Drop(data_type_filtered$expression_profile)
Model learning:
# training the model. To improve the accuracy for "unassigned" cell, you can increase "bootstrap_times", but it will takes longer time. The default value of "bootstrap_times" is 10.
scLearn_model_learning_result<-scLearn_model_learning(high_varGene_names,data_type_filtered$expression_profile,data_type_filtered$sample_information_cellType,bootstrap_times=1)
Cell assignment:
# loading the quary cell and performing cell quality control.
data2<-readRDS('xin-human.rds')
rawcounts2<-assays(data2)[[1]]
### the true labels of this test datasets
#query_ann<-as.character(data2$cell_type1)
#names(query_ann)<-colnames(data2)
#query_ann<-query_ann[query_ann %in% c("alpha","beta","delta","gamma")]
#rawcounts2<-rawcounts2[,names(query_ann)]
#data_qc_query<-Cell_qc(rawcounts2,query_ann,species="Hs")
###
data_qc_query<-Cell_qc(rawcounts2,species="Hs",gene_low=50,umi_low=50)
# Assignment with trained model above. To get a less strict result for "unassigned" cells, you can decrease "diff" and "vote_rate". If you are sure that the cell type of query cells must be in the reference dataset, you can set "threshold_use" as FALSE. It means you don't want to use the thresholds learned by scLearn.
scLearn_predict_result<-scLearn_cell_assignment(scLearn_model_learning_result,data_qc_query$expression_profile,diff=0.05,threshold_use=TRUE,vote_rate=0.6)
### **Multi-label single cell assignment**
For illustration purpose, we took the dataset ESC.rds as an example.
Data preprocessing:
# loading the reference dataset
data<-readRDS('ESC.rds')
rawcounts<-assays(data)[[1]]
refe_ann1<-as.character(data$cell_type1)
names(refe_ann1)<-colnames(data)
refe_ann2<-as.character(data$cell_type2)
names(refe_ann2)<-colnames(data)
# cell quality control and rare cell type filtered and feature selection
data_qc<-Cell_qc(rawcounts,refe_ann1,refe_ann2,species="Hs")
data_type_filtered<-Cell_type_filter(data_qc$expression_profile,data_qc$sample_information_cellType,data_qc$sample_information_timePoint,min_cell_number = 10)
high_varGene_names <- Feature_selection_M3Drop(data_type_filtered$expression_profile)
Model learning:
# training the model
scLearn_model_learning_result<-scLearn_model_learning(high_varGene_names,data_type_filtered$expression_profile,data_type_filtered$sample_information_cellType,data_type_filtered$sample_information_timePoint,dim_para=0.999)
Cell assignment: We just use 'ESC.rds' itself to test the multi-label single cell assignment here.
# loading the quary cell and performing cell quality control
data2<-readRDS('ESC.rds')
rawcounts2<-assays(data2)[[1]]
### the true labels of this test dataset
#query_ann1<-as.character(data2$cell_type1)
#names(query_ann1)<-colnames(data2)
#query_ann2<-as.character(data2$cell_type2)
#names(query_ann2)<-colnames(data2)
#rawcounts2<-rawcounts2[,names(query_ann1)]
#data_qc_query<-Cell_qc(rawcounts2,query_ann1,query_ann2,species="Hs")
data_qc_query<-Cell_qc(rawcounts2,species="Hs",gene_low=50,umi_low=50)
# Assignment with trained model above
scLearn_predict_result<-scLearn_cell_assignment(scLearn_model_learning_result,data_qc_query$expression_profile)
### **Pre-trained scLearn models**
Pre-trained scLearn models : For the convenience of users, besides the R package of scLearn, we also offered all the pre-trained scLearn models for the 30 datasets used in our study and the pre-trained scLearn models for the 20 mouse organs datasets. These reference datasets comprehensively cover the commonly used brain cells, immune cells, pancreas cells, embryo stem cells, retina cells and lung cancer cell lines with coarse-grained and fine-grained annotation and 20 mouse organs, which can be directly used and beneficial for the related single cell categorizing by experimental researchers. The information of each pre-trained scLearn models is shown below:
The information of pre-trained scLearn models of the 30 datasets | Pre-trained model names | Description | No. of cell types | Corresponding dataset(Journal, date) |
---|---|---|---|---|
pancreas_mouse_baron.rds | Mouse pancreas | 9 | Baron_mouse(Cell System, 2016) | |
pancreas_human_baron.rds | Human pancreas | 13 | Baron_human(Cell System, 2016) | |
pancreas_human_muraro.rds | Human pancreas | 8 | Muraro(Cell System, 2016) | |
pancreas_human_segerstolpe.rds | Human pancreas | 8 | Segerstolpe(Cell Metabolism, 2016) | |
pancreas_human_xin.rds | Human pancreas | 4 | Xin(Cell Metabolism, 2016) | |
embryo_development_mouse_deng.rds | Mouse embryo development | 4 | Deng(Science, 2014) | |
cerebral_cortex_human_pollen.rds | Human cerebral cortex | 9 | Pollen(Nature biotechnology, 2014) | |
colorectal_tumor_human_li.rds | Human colorectal tumors | 5 | Li(Nature genetics, 2017) | |
brain_mouse_usoskin.rds | Mouse brain | 4 | Usoskin(Nature neuroscience,2015) | |
cortex_mouse_tasic.rds | Mouse cortex | 17 | Tasic(Nature neuroscience, 2016) | |
embryo_stem_cells_mouse_klein.rds | Mouse embryo stem cells | 4 | Klein(Cell, 2015) | |
brain_mouse_zeisel.rds | Mouse brain | 9 | Zeisel(Science, 2015) | |
retina_mouse_shekhar_coarse-grained_annotation.rds | Mouse retina | 4 | Shekhar(Cell, 2016) | |
retina_mouse_shekhar_fine-grained_annotation.rds | Mouse retina | 17 | Shekhar(Cell, 2016) | |
retina_mouse_macosko.rds | Mouse retina | 12 | Macosko(Cell, 2015) | |
lung_cancer_cell_lines_human_cellbench10X.rds | Mixture of five human lung cancer cell lines | 5 | CellBench_10X(Nature methods, 2019) | |
lung_cancer_cell_lines_human_cellbenchCelSeq.rds | Mixture of five human lung cancer cell lines | 5 | CellBench_CelSeq2(Nature methods, 2019) | |
whole_mus_musculus_mouse_TM.rds | Whole Mus musculus | 55 | TM(Nature, 2018) | |
primary_visual_cortex_mouse_AMB_coarse-grained_annotation_3.rds | Primary mouse visual cortex | 3 | AMB(Nature, 2018) | |
primary_visual_cortex_mouse_AMB_fine-grained_annotation_14.rds | Primary mouse visual cortex | 14 | AMB(Nature, 2018) | |
primary_visual_cortex_mouse_AMB_fine-grained_annotation_68.rds | Primary mouse visual cortex | 68 | AMB(Nature, 2018) | |
PBMC_human_zheng_sorted.rds | FACS-sorted PBMC | 10 | Zheng sorted(Nature communications ,2017) | |
PBMC_human_zheng_68K.rds | PBMC | 11 | Zheng 68k(Nature communications, 2017) | |
primary_visual_cortex_mouse_VISP_coarse-grained_annotation.rds | Mouse primary visual cortex | 3 | VISp(Nature, 2018) | |
primary_visual_cortex_mouse_VISP_fine-grained_annotation.rds | Mouse primary visual cortex | 33 | VISp(Nature, 2018) | |
anterior_lateral_motor_area_mouse_ALM_coarse-grained_annotation.rds | Mouse anterior lateral motor area | 3 | ALM(Nature, 2018) | |
anterior_lateral_motor_area_mouse_ALM_fine-grained_annotation.rds | Mouse anterior lateral motor area | 32 | ALM(Nature, 2018) | |
middle_temporal_gyrus_human_MTG_coarse-grained_annotation.rds | Human middle temporal gyrus | 3 | MTG(Nature, 2019) | |
middle_temporal_gyrus_human_MTG_fine-grained_annotation.rds | Human middle temporal gyrus | 34 | MTG(Nature, 2019) | |
PBMC_human_a10Xv2.rds | Human PBMC | 9 | PbmcBench_a10Xv2(bioRxiv, 2019) | |
PBMC_human_a10Xv3.rds | Human PBMC | 8 | PbmcBench a10Xv3(bioRxiv, 2019) | |
PBMC_human_CL.rds | Human PBMC | 7 | PbmcBench_CL(bioRxiv, 2019) | |
PBMC_human_DR.rds | Human PBMC | 9 | PbmcBench_DR(bioRxiv, 2019) | |
PBMC_human_iD.rds | Human PBMC | 7 | PbmcBench_iD(bioRxiv, 2019) | |
PBMC_human_SM2.rds | Human PBMC | 6 | PbmcBench_SM2(bioRxiv, 2019) | |
PBMC_human_SW.rds | Human PBMC | 7 | PbmcBench_SW(bioRxiv, 2019) |
The information of pre-trained scLearn models for the 20 mouse organs datasets
Trained model names | Description | No. of cell types |
---|---|---|
Aorta_mouse_FACS.rds | Mouse aorta | 4 |
Bladder_mouse_FACS.rds | Mouse bladder | 2 |
Brain_Myeloid_mouse_FACS.rds | Mouse brain myeloid | 2 |
Brain_Non-Myeloid_mouse_FACS.rds | Mouse brain non-myeloid | 7 |
Diaphragm_mouse_FACS.rds | Mouse diaphragm | 5 |
Fat_mouse_FACS.rds | Mouse fat | 6 |
Heart_mouse_FACS.rds | Mouse heart | 10 |
Kidney_mouse_FACS.rds | Mouse kidney | 5 |
Large_Intestine_mouse_FACS.rds | Mouse large intestine | 5 |
Limb_Muscle_mouse_FACS.rds | Mouse limb muscle | 8 |
Liver_mouse_FACS.rds | Mouse liver | 5 |
Lung_mouse_FACS.rds | Mouse lung | 11 |
Mammary_Gland_mouse_FACS.rds | Mouse mammary gland | 4 |
Marrow_mouse_FACS.rds | Mouse marrow | 21 |
Pancreas_mouse_FACS.rds | Mouse pancreas | 9 |
Skin_mouse_FACS.rds | Mouse skin | 5 |
Spleen_mouse_FACS.rds | Mouse spleen | 3 |
Thymus_mouse_FACS.rds | Mouse thymus | 3 |
Tongue_mouse_FACS.rds | Mouse tongue | 2 |
Trachea_mouse_FACS.rds | Mouse trachea | 4 |
# loading the quary cell and performing cell quality control
data2<-readRDS('xin-human.rds')
rawcounts2<-assays(data2)[[1]]
#query_ann<-as.character(data2$cell_type1)
#names(query_ann)<-colnames(data2)
#query_ann<-query_ann[query_ann %in% c("alpha","beta","delta","gamma")]
#rawcounts2<-rawcounts2[,names(query_ann)]
#data_qc_query<-Cell_qc(rawcounts2,query_ann,species="Hs")
data_qc_query<-Cell_qc(rawcounts2,species="Hs",gene_low=50,umi_low=50)
# Assignment with pre-trained models
# Take pancreas_human_baron.rds as example
scLearn_model_learning_result<-readRDS("pancreas_human_baron.rds")
# Predict the cell types
scLearn_predict_result<-scLearn_cell_assignment(scLearn_model_learning_result,data_qc_query$expression_profile)
B. Duan, C. Zhu, G. Chuai, C. Tang, X. Chen, S. Chen, S. Fu, G. Li, Q. Liu, Learning for single-cell assignment. Sci. Adv. 6, eabd0855 (2020)
bioinfo_db@163.com or qiliu@tongji.edu.cn