Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
260 stars 40 forks source link

Can celltypist handle doublets and low quality cells? #43

Closed rgranit closed 1 year ago

rgranit commented 1 year ago

Should cells with low counts/ high-Mitochondria and doublets be discarded prior to entering the data to cell typist? or can one remove them thereafter?

Thanks!

ChuanXu1 commented 1 year ago

@rgranit, you can either remove these cells prior or keep them during prediction. After prediction, there will be a confidence/probability matrix (cell-by-cell-type matrix in predictions.probability_matrix). A low quality cell will probably have low scores across all cell types, and a multiplet will likely have high scores in >=2 cell types. You can also use celltypist.dotplot to have a quick overview of the score distribution:

predictions = celltypist.annotate(adata, model = 'some_model.pkl')
#if you have some cluster or cell type column to compare against CellTypist annotation
celltypist.dotplot(predictions, 'cluster_or_celltype_column_of_adata', 'predicted_labels', filter_prediction = 0.03)
#if you do not have any cluster or cell type to inspect 
celltypist.dotplot(predictions, 'predicted_labels', 'predicted_labels', filter_prediction = 0.03)

By checking the dot color, you will have some ideas on how confident the predictions are (e.g., blue ones are usually low-quality or novel query cells, which although are assigned to one of the cell types in the model, have low scores to believe the predicted identity).

rgranit commented 1 year ago

Thanks @ChuanXu1 for your reply! will give it a try