bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.02k stars 199 forks source link

Question about the experssion binning? #8

Open FluentAI opened 1 year ago

FluentAI commented 1 year ago

I find the expression values are binning to address the batch effect and the GEP task is also a classification issue. I'd like to know how to transfer the binning target to the true expression value or the Pearson correlation between the predicted expression and the target expression is used the binning index?

SuperChanS commented 1 year ago

I have the similar question. Furthermore, I wanna know whether the perturbation prediction in scGPT follows the same protocol as scFormer?

subercui commented 1 year ago

Thanks for the great question. We should have stated it more clearly. The binning was mainly used in pretraining and in fine-tuning tasks where you don't really care about the actual expression values, such as batch correction, multi-omic integration, and celltype annotation. As for the perturbation response prediction, we simply used log expression input without binning, and the predictions were also optimized to match the true expression value.

We spent quite some time in the last month finalizing the results. Aligning with our release plan, we will upload the pretraining code and perturbation fine-tuning scripts next week

thomasgaudelet commented 1 year ago

Hello! Really cool work! I'd like just a clarification on this subject of binning vs no-binning. My understanding from above is that the model would always be pre-trained using the proposed binning approach. However, for the downstream perturbation response task, these are replaced with the actual values (using the ContinuousValueEncoder). Intuitively, I would expect this reduce the relevance of the pre-trained weights. Have you tried doing the entire pre-training using actual values for comparison? Do you have any intuition for the benefits of the binning strategy?