Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
278 stars 44 forks source link

`celltypist.annotate` to specify the required scale for count normalization (currently 10'000) #3

Closed watiss closed 2 years ago

watiss commented 2 years ago

It appears that counts needs to be normalized with a scale of 10'000 when calling celltypist.annotate. This is not clear from the documentation of that function. However one can figure it out by trial and error or by code inspection (for instance from the following code in classifier.py: if np.abs(np.expm1(self.indata[0]).sum()-10000) > 1: raise ValueError("🛑 Invalid expression matrix, expect log1p normalized expression to 10000 counts per cell")).

This issue is a suggestion to explicitly call that out in the documentation of the annotate method.

Thanks!

ChuanXu1 commented 2 years ago

Hi watiss,

Table inputs for CellTypist (such as those uploaded by users in celltypist.org or directly as input to celltypist.annotate function) are usually large files (.csv/.tab/.txt), as compared to the sparse matrix stored in an AnnData object. To ease the data burden, for a table input we require it as a raw count matrix (ideally without digits) and normalize it under the hood.

Hope it is clear.