ZJUFanLab / scCATCH

Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data
https://www.sciencedirect.com/science/article/pii/S2589004220300663
GNU General Public License v3.0
217 stars 38 forks source link

not matched with the tissue types in CellMatch database! #4

Closed william-swl closed 4 years ago

william-swl commented 4 years ago

Hello, I'm trying to use this great tool for cell cluster annotation. But when I test it with the pbmc3k dataset of 10x in Seurat guide, the findmarkergenes function always throw error. If I let tissue = NULL, it will cost a very long time.

This is the code, what can I do to prevent this error?

Seurat - Guided Clustering Tutorial dataset - pbmc3k

> Mtexpr <- Read10X(data.dir = 'filtered_gene_bc_matrices/hg19')
> Utexpr <- CreateSeuratObject(counts = Mtexpr, min.cells = 0)
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
> Utexpr <- NormalizeData(object = Utexpr, normalization.method = 'LogNormalize', scale.factor = 10000)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> Utexpr <- FindVariableFeatures(Utexpr, selection.method = 'vst')
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> Utexpr <- ScaleData(Utexpr, features = rownames(Utexpr))
Centering and scaling data matrix
  |                                                               |                                                               |                                                               |====                                                           |                                                               |========                                                       |                                                               |============                                                   |                                                               |================                                               |                                                               |=====================                                          |                                                               |=========================                                      |                                                               |=============================                                  |                                                               |=================================                              |                                                               |=====================================                          |                                                               |=========================================                      |                                                               |=============================================                  |                                                               |=================================================              |                                                               |======================================================         |                                                               |==========================================================     |                                                               |=============================================================  |                                                               |=============================================================  |                                                               |======================================================================| 100%
> Utexpr <- RunPCA(Utexpr, features = VariableFeatures(Utexpr), npcs = 50, ndims.print = 1:5, nfeatures.print = 5)
PC_ 1 
Positive:  LTB, IL32, CD2, ACAP1, STK17A 
Negative:  CST3, LST1, AIF1, FTL, FCN1 
PC_ 2 
Positive:  NKG7, PRF1, GZMA, GZMB, CTSW 
Negative:  MS4A1, HLA-DRA, HLA-DQA1, LINC00926, HLA-DQB1 
PC_ 3 
Positive:  TMSB4X, S100A4, S100A6, IL32, RBP7 
Negative:  HLA-DQA1, CD79B, HLA-DQB1, MS4A1, CD74 
PC_ 4 
Positive:  SDPR, HIST1H2AC, GNG11, SPARC, TUBB1 
Negative:  TMSB10, VIM, LTB, S100A10, MAL 
PC_ 5 
Positive:  LTB, VIM, AQP3, MAL, PPA1 
Negative:  NKG7, GZMB, PRF1, GNLY, GZMA 
Warning message:
In PrepDR(object = object, features = features, verbose = verbose) :
  The following 263 features requested have not been scaled (running reduction without them): PPBP, S100A9, IGLL5, PF4, FCER1A, S100A8, C1QA, CCL4, C1QB, AL928768.3, C10orf32, GP9, IGJ, LYPD2, HBA1, KIAA0101, APOBEC3B, LILRA4, GIMAP5, CD79A, C16orf13, TNFRSF17, CLDN5, TREML1, PTGDS, IL8, FCGR3A, GZMH, FGFBP2, TYROBP, CORO1B, PTCRA, TCL1A, CLEC1B, OSCAR, FERMT3, MTERFD2, STX11, OXLD1, CMTM5, C19orf52, EIF1AY, CST7, NRGN, HRASLS2, C1QC, IL1B, LCN2, LRRC26, CDA, TIGIT, FOLR3, GZMK, AC147651.3, VMO1, RP11-879F14.2, LY6G6F, HLA-DRB5, SCT, RGS18, ISOC2, CTD-2006K23.1, SEPT5, RP5-887A10.1, C19orf33, CTD-2302E22.4, FCGR3B, JAKMIP1, RP11-428G5.5, XCL2, ACY3, RP11-1070N10.3, NFE2, FCGR2A, TUBA8, S1PR4, PDZK1IP1, MRPL12, STRA13, AP001189.4, CTA-217C2.1, MLTK, FCGR1A, PRR12, AC113189.5, MDS2, AL928742.12, P2RX5, KRT7, HGD, PEX16, ANKEF1, PCP2, AP003733.1, SIGLEC14, RP5-1028K7.2, CLIC3, CTC-378H22.1, BIK, TMEM194A, EGFL7, C15orf48, TSPAN15, SPTSSB, UGT2B17, LINC00936, RP11-70P17.1, RP11-367G6.3, MINA [... truncated]
> Utexpr <- FindNeighbors(Utexpr, dim = 1:15)
Computing nearest neighbor graph
Computing SNN
> Utexpr <- FindClusters(Utexpr, resolution = 0.4)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 2700
Number of edges: 112798

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8803
Number of communities: 8
Elapsed time: 0 seconds
> Utmarker <- findmarkergenes(Utexpr,species = 'Human', cluster = 'All', match_CellMatch = TRUE, tissue = 'Blood-related', cell_min_pct = 0.25,logfc = 0.25,pvalue = 0.05)
Note: the raw data matrix includes 2700 cells and 32738 genes. 

---Revising gene symbols according to NCBI Gene symbols (updated in Jan. 10, 2020, https://www.ncbi.nlm.nih.gov/gene) and no matched genes and duplicated genes will be removed. 

Note: the new data matrix includes 2700 cells and 20805 genes. 

Error in findmarkergenes(Utexpr, species = "Human", cluster = "All", match_CellMatch = TRUE,  : 
  Blood-related, not matched with the tissue types in CellMatch database! Please select one or more related tissue types.

> Utmarker <- findmarkergenes(Utexpr,species = 'Human', cluster = 'All', match_CellMatch = TRUE, tissue = 'Lymph-related', cell_min_pct = 0.25,logfc = 0.25,pvalue = 0.05)
Note: the raw data matrix includes 2700 cells and 32738 genes. 

---Revising gene symbols according to NCBI Gene symbols (updated in Jan. 10, 2020, https://www.ncbi.nlm.nih.gov/gene) and no matched genes and duplicated genes will be removed. 

Note: the new data matrix includes 2700 cells and 20805 genes. 

Error in findmarkergenes(Utexpr, species = "Human", cluster = "All", match_CellMatch = TRUE,  : 
  Lymph-related, not matched with the tissue types in CellMatch database! Please select one or more related tissue types.
ZJUFanLab commented 4 years ago

Hello, the tissue should be more specific such as 'Blood', 'Peripheral blood', 'Plasma' or 'Serum', etc.

You can use our scCATCH as

findmarkergenes(Utexpr,species = 'Human', cluster = 'All', match_CellMatch = TRUE, tissue = 'Blood',cell_min_pct = 0.25,logfc = 0.25,pvalue = 0.05)

or

findmarkergenes(Utexpr,species = 'Human', cluster = 'All', match_CellMatch = TRUE, tissue = c('Blood','Peripheral blood'),cell_min_pct = 0.25,logfc = 0.25,pvalue = 0.05)

or

findmarkergenes(Utexpr,species = 'Human', cluster = 'All', match_CellMatch = TRUE, tissue = c('Blood','Peripheral blood','Plasma'),cell_min_pct = 0.25,logfc = 0.25,pvalue = 0.05)

or any combination of Blood-related tissues as detalied in 3.1.1

william-swl commented 4 years ago

@ZJUFanLab Thank you very much! It works well!