digitalcytometry / ecotyper

EcoTyper is a machine learning framework for large-scale identification of cell states and cellular ecosystems from gene expression data.
Other
177 stars 41 forks source link

Expression matrix of scRNA-seq #53

Closed HengqiLiu closed 1 year ago

HengqiLiu commented 1 year ago

Dear Ecotyper team,

I'm a little confused about the form of the input data about discovery in scRNA-seq data, and I noticed that the example data is counts.

What form of data should I input to run discovery in scRNA-seq data?

Expression matrix field should contain the path to a tab-delimited file containing the expression data, with genes as rows and cells as columns. The expression matrix should be in the TPM, CPM or other suitable normalized space. The users should perform their own quality control of the expression matrix before applying EcoTyper (e.g. to filter low-quality cells, doublets, etc.). However we do not recommend to pre-filter the matrix for variable genes, as EcoTyper performs an internal selection for genes that show cell-type specificity. The matrix should have gene symbols on the first column and gene counts for each cell on the next columns. Column (cells) names should be unique. Also, we recommend that the column names do not contain special characters that are modified by the R function make.names, e.g. having digits at the beginning of the name or containing characters such as space, tab or -:

Thank you, Hengqi Liu

BALuca commented 1 year ago

Hi,

We do not have a strong recommendation about the exact processing, it is up to the users.

Best, The EcoTyper team