Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
254 stars 40 forks source link

Running the CellTypist training function celltypist.train on a subset of genes #107

Open dkapadia612 opened 4 months ago

dkapadia612 commented 4 months ago

I would like to train a CellTypist model to identify certain cell types with a specific gene set. I tried feeding the function a list of genes using the 'genes' argument but it still trained using all features. Besides only keeping the select genes in the adata.var, are there any other approaches to make this work? Additionally, does training the model on <50 genes affect the accuracy of prediction of the trained celltypist model, or is there a threshold gene count below which you wouldn't recommend training a model? I would appreciate any help you can provide!

ChuanXu1 commented 4 months ago

@dkapadia612, you can train the model using any numbers of genes. There is no definitive relationship between the accuracy of the model and the number of genes (for example, a dataset with clearly distinct cell types may only rely on a handful of genes). To train the model using a subset of genes, you can use model = celltypist.train(adata[:, a_subset_genes], check_expression = False, ...)