digitalcytometry / ecotyper

EcoTyper is a machine learning framework for large-scale identification of cell states and cellular ecosystems from gene expression data.
Other
170 stars 41 forks source link

scRNA-seq cell type annotation in the input requirement #11

Closed cjhong closed 2 years ago

cjhong commented 2 years ago

I am interested in ecotyper and thank you for the great analysis method!

I have my own scRNA-seq data and run the ecotyper following Tutorial 5. There are two input requirements: 1) expression matrix and 2) annotation matrix (e.g. scRNA_CRC_annotation.txt).

I am wondering if you can explain what is the requirement of the annotation matrix. In the tutorial section, it seems three columns are required: columns: ID, CellType, and Sample.

I would like to know what is the condition of CellType? Should it be annotated by cibersortx LM22 (I concluded this after I quickly overviewed the paper since the DLBCL bulk RNA is decomposed by cibersortx LM22)? If so, what is the best practice to run scRNA-seq data in cibersortx? From scRNA_CRC_annotation.txt, I found that it uses only one cell type at the CellType column. Does this mean it picks the cell type with the highest frequency?

Let me know if I am totally on the wrong track.

Thank you again!

BALuca commented 2 years ago

Hello, thanks for your interest in EcoTyper! The annotation matrix should have at least three columns: ID, CellType, and Sample. If you intent on running Tutorial 5, there is no restriction on what kind of cell types you can include in column CellType - whatever makes most sense for your biological system. In the DLBCL paper we used LM22 (the population with the highest fraction), but this is by no means required. You can provide cell type labels obtained by any approach (e.g. by Seurat clustering and then annotating based on marker genes, or using automated annotation methods).