dataset 1 (Seurat PBMC) gene difference and few-shot learning

AdorableYoyo commented 2 months ago

Hi,

Thanks for the awesome work and repository.

I have two questions:

The RNA genes from Seurat PBMC are 23,385 in the demo data, but there are 20,729 genes from the reference here https://atlas.fredhutch.org/nygc/multimodal-pbmc/, and after filtering out those without HUGO, only 16,798 remained. I wonder what may be the additional procedure that led to this difference. That would be great if you could provide the full data.
How the evaluation of the few-shot case on the Seurat PBMC dataset was done? As 90% was used for stage 2 pretraining, did you use 20 cells from the rest 10%? And fine-tuned on the pre-trained model (trained with 90% on the same dataset?)? Or you only used 20 cells fine-tuned on stage ?

ElaineLIU-920 commented 1 month ago

Hi,

Thank you for your interest in scTranslator.

As mentioned in our manuscript, the complete Seurat PBMC data can be accessed from this link. We did not perform any gene filtering and matched scTranslatorID using entrezID.
For the evaluation of the few-shot case on the Seurat PBMC dataset, we used only 20 cells from the remaining 10% of the dataset and fine-tuned on the pre-trained model. The pre-trained model was trained with shuffled gene positions for each cell, whereas for the few-shot evaluation, the genes were aligned.

AdorableYoyo commented 1 week ago

Thank you!

TencentAILabHealthcare / scTranslator