biomap-research / scFoundation

Apache License 2.0
183 stars 27 forks source link

Question for different ways of Inference for SCAD and DeepCDR #8

Closed KatarinaYuan closed 6 months ago

KatarinaYuan commented 7 months ago

Hi, Thank you for the great work! I just noticed that the way for scFoundation to do inference in SCAD and DeepCDR is different. In DeepCDR, [totalcount+args.highres,totalcount] is attached to the end of pretrain_gene_x. https://github.com/biomap-research/scFoundation/blob/1571ef085006aac63fa04fb592236f3198bd99d1/DeepCDR/prog/run_pytorch_embedding.py#L79

In SCAD, [args.tgthighres,totalcount] is attached to the end of pretrain_gene_x. https://github.com/biomap-research/scFoundation/blob/1571ef085006aac63fa04fb592236f3198bd99d1/SCAD/run_embedding_sc.py#L62

Could you please share more insights on the meaning of the variables and the dataset difference in these two tasks? Thank you for help!

WhirlFirst commented 7 months ago

In DeepCDR, we think the original total expression value of bulk data will be helpful, so we use this addition way to enhance the read depth. For SCAD, the objective is to transfer the prediction model from bulk data to single cell data, and the total expression value of these two types of data is not comparable, which means the read depth of single cells would be a confounding factor. So we decided to normalize the read depth in the same value.