TencentAILabHealthcare / spatialID

32 stars 4 forks source link

How to train the DNN model according to the scRNA-seq data #3

Open 20182531027 opened 1 year ago

20182531027 commented 1 year ago

HI, thanks for your wonderful tools for spatial data annotation!!!!!! I have tried one of the .py script, when i attempt to run the .py, I used the dnn_model provided by you. I wonder how can i train he DNN model according to the scRNA-seq data? Would you be kind to give me some enlightment?

SilversH commented 1 year ago

Hi @20182531027 , if you want to train an DNN model by yourself, you can just import the DNN module by "from cell_type_annotation_model import DNNModel" and apply a standard pytorch training procedure.

We set learning rate to 3e-4 and L2 penalty (weight decay) to 1e-6 without any other tricks. Using focal loss may be better than simple Cross Entropy. Make sure the same preprocessing is applied to both your scRNA-seq and your spatial data.

20182531027 commented 1 year ago

thanks for your kind reply. gratefully, i am appreciate that you have created such a useful software, it's a little hard for to train such a challenging model.

I wonder which sc-dnn_model should i choose for the stereoseq data? i see the "checkpoint_MERFISH_s.t7", and i find only 20 labels, which is different from the annotation results of the chip provided in your reference data, and the chipid is "SS200000128TR_E2"

the label provided by checkpoint_MERFISH_s.t7 ↓

checkpoint['label_names'] ['Astro', 'Endo', 'L2/3 IT', 'L5 ET', 'L5 IT', 'L5/6 NP', 'L6 CT', 'L6 IT', 'L6b', 'Lamp5', 'Macrophage', 'OPC', 'Oligo', 'Peri', 'Pvalb', 'SMC', 'Sncg', 'Sst', 'VLMC', 'Vip'] the annotation information of the chip SS200000128TR_E2

the celltype_pred of the chip SS200000128TR_E2 provided in your reference data ↓ image

20182531027 commented 1 year ago

Hi @20182531027 , if you want to train an DNN model by yourself, you can just import the DNN module by "from cell_type_annotation_model import DNNModel" and apply a standard pytorch training procedure.

We set learning rate to 3e-4 and L2 penalty (weight decay) to 1e-6 without any other tricks. Using focal loss may be better than simple Cross Entropy. Make sure the same preprocessing is applied to both your scRNA-seq and your spatial data.

thank you~ excuse me, what's the possibility that you provide the test_single_cell_datasets as well as the training scripts?