gao-lab / Cell_BLAST

A BLAST-like toolkit for large-scale scRNA-seq data querying and annotation.
http://cblast.gao-lab.org
MIT License
87 stars 13 forks source link

Could we use immgen datasets as a reference? #4

Closed chansigit closed 4 years ago

chansigit commented 4 years ago

Dear Developers? Thank you for you CellBlast. I recently used SingleR and found the built-in ImmGen reference very useful. Could we use that dataset in CellBlast?

Thank you for your response.

Jeff1995 commented 4 years ago

Thanks for your interest for Cell BLAST!

In principle Cell BLAST can be used with any scRNA-seq reference data. You may read custom reference data via functions like read_table, from_anndata and from_loom in the Python package, and then train your own model on the custom reference (this tutorial might help).

However, the ImmGen reference in SingleR (SingleR::ImmGenData) seems to be microarray data, which may not match the probability model used by Cell BLAST, and the data size (830 cells) is too small for the Cell BLAST model to train properly, so it's not recommended to use Cell BLAST with ImmGen. I would recommend using larger scRNA-seq datasets (> 3,000 cells) as reference in this case .