Teichlab / celltypist

A tool for semi-automatic cell type classification
https://www.celltypist.org/
MIT License
291 stars 46 forks source link

downsampling before training the model #128

Open Flu09 opened 2 months ago

Flu09 commented 2 months ago

The custom model training is taking more time for me than anticipated. What is the ideal way to down sample the reference while keeping all cell types ? How to do this in python? what is the drawback of downsampling versus using hvgs?

new_model = celltypist.train(ref_adata, labels = 'cell_type_high_resolution', n_jobs = 30, feature_selection = True)
⚠️ Warning: it may take a long time to train this dataset with 2359994 cells and 31629 genes, try to downsample cells and/or restrict genes to a subset (e.g., hvgs)
ChuanXu1 commented 2 months ago

@Flu09, as long as each cell type is homogenous, it will be reasonable to perform downsampling (you do not lose granularity for each cell type). You can downsample your data using the methods you prefer, or a quick downsampling function in celltypist