Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
260 stars 40 forks source link

Guidance for using Nanostring CosMx RNA input #40

Closed markdane closed 1 year ago

markdane commented 1 year ago

Hello, I appreciate your work in making CellTypist available. I have been able to use the python API to assign predicted_labels and majority_voting types to our data but am getting the warning message below while running celltypist.annotate:

⚠️ Warning: the input file seems not a raw count matrix. The prediction result may not be accurate

The CosMx data contains counts for 960 genes and 20 negative probes. It is sparse data with an average of 250 unique genes per cell. I have processed these by normalizing each cell to have a target of 10,000 counts then computing their log1p values. The input file is attached. I have also tried using an annData object as input but this throws an error instead of just a warning.

Can you comment on whether this data is a good match for CellTypist and if I have it in the best or correct format?

-Mark Dane CT_sample_file_values.csv

ChuanXu1 commented 1 year ago

@markdane, if you want to use a csv file as the input, a raw count matrix (that only contains integers) is required. If you use an AnnData as input, you need to do normalization as you have already done. So just input a csv file with raw counts without normalization in your case. Also, because your data is sparse, many informative genes in the model may not be utilized by your data

ChuanXu1 commented 1 year ago

This issue should be solved. Please reopen it if you still have problem :).