biomap-research / scFoundation

Apache License 2.0
180 stars 26 forks source link

Question for single cell type annotation using cell embeddings #32

Open Zjianglin opened 1 week ago

Zjianglin commented 1 week ago

Hi, thanks for you excellent work.

I'm trying to use the scFoundation to annotate my scRNA-seq dataset. I followed your tutorials of model and annotation. However, I'm confused about annotation section. It seems only one line involved cell type predication: y_pred = np.argmax(emb,1). However, for my new dataset, how could I get the predicated cell type label for each cell?

Here is my steps:

  1. subset top 6000 highly variable genes and export the normalized expression matrix(data slot of SeuratObject) to a csv file. (BTW, my scRNAseq data is from mouse liver tissue, I used babelgene to convert the mouse genes to human orthologs.)
  2. generate a cell embedding matrix using the get_embedding.py: python get_embedding.py --task_name msl10x --input_type singlecell --output_type cell --pool_type all --tgthighres a5 --data_path all_harmony_integrated_RNA_normalized_expressions.csv --save_path ./scFoundation --pre_normalized T --version rde. The result npy matrix shape is (68523, 3072), where 68523 is the total number of my cells.
  3. Now. I confused about how to do next.

Is there anything I did wrong? How could I predicate cell type label for each cell? Thanks.

WhirlFirst commented 3 days ago

Hi, The code in the annotation is for showing the results in our paper. As for using scFoundation for annotation on a new dataset, you need to fine-tune the model by yourself, you can follow the fine-tune tutorial https://github.com/biomap-research/scFoundation/tree/main/model#finetuneintegrate-scfoundation-with-other-models