Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
284 stars 45 forks source link

undetermined cells #41

Closed pranithavangala closed 1 year ago

pranithavangala commented 1 year ago

Hi

Im trying to run celltypist on several different datasets (PBMCs, Spleenocytes etc) in almost all cases im getting more than 30% of cells being called as undetermined. I tried using some high quality public datasets as well but end up with same situation. Im using the low resolution Immune cell model. Can you help me understand what I can do to trouble shoot ?

ChuanXu1 commented 1 year ago

@pranithavangala, can you confirm you are using the most recent models? You can re-download them by celltypist.models.download_models(True)

pranithavangala commented 1 year ago

Thank you I downloaded recent model, now I have many Heterogeneous cells (>60%)

ChuanXu1 commented 1 year ago

@pranithavangala, when some homogeneous cell types are predicted together, it is possible that all of these cell types get similar high scores that will result in Heterogeneous. If your data have such spectrum of cell types, you can use the default mode to select/predict the cell type with the maximal likelihood.

pranithavangala commented 1 year ago

@ChuanXu1 I think there is something weird going on with Immune_All_Low.pkl. When I use Immune_All_high.pkl I get most cells predicted as T cells but when I change the model to Immune_All_Low then I get most as Heterogeneous or unassigned

Screen Shot 2022-11-04 at 10 01 37 AM
ChuanXu1 commented 1 year ago

@pranithavangala, as mentioned in my previous comment, when you use Immune_All_Low, there are a lot of very similar cell types in the model (especially in the T cell compartment) which are assigned close scores using CellTypist (for example, 0.95 vs. 0.9). You will thus possibly get Heterogeneous using a cutoff such as 0.5. You can try using the default mode (i.e., best) to select/predict the cell type with the maximal likelihood.

pranithavangala commented 1 year ago

@ChuanXu1 Thank you I chnaged to best match and now results are starting to make sense. One thing is a little still unclear when I use the "Immune_All_high" you can see most of the cells I have are classified as T cells, which is perfect. But when I use Immune_All_low in best match mode, it classifies a bunch of cells as B cells. Is it possible to restrict the cell type annotation in Immune_All_low model based on the parent cell type assigned from Immune_All_high. For example, to gain resolution I can subdivide my Tcells to various T cells types only and not include other cell types like B Fibroblasts etc Screen Shot 2022-11-07 at 9 55 38 AM

ChuanXu1 commented 1 year ago

@pranithavangala

Is it possible to restrict the cell type annotation in Immune_All_low model based on the parent cell type assigned from Immune_All_high

Currently there is no connection between the high and low models, because if we connect them, an error in the high model (such as erroneously assigning a T cell to B cell) will be propagated to the low-level model (i.e., the low model only considers B cell subtypes).

For example, to gain resolution I can subdivide my Tcells to various T cells types only and not include other cell types like B Fibroblasts etc

It is usually not advisable to restrict the search scope of cell types during prediction. But if you are pretty sure you only have T cells in your data, you can manipulate the result as below

prob_matrix = predictions.probability_matrix prob_matrix = prob_matrix.loc[:, prob_matrix.columns.str.contains("T cells")] prob_matrix.idxmax(axis = 1)