bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.05k stars 206 forks source link

Perturbation Prediction on foundation model possible? #41

Closed simon-harvEDU closed 1 year ago

simon-harvEDU commented 1 year ago

Dear scGPT team, thanks for providing the tutorial and example for the perturbation prediction. Gets lots of attention.

I would have a question if your model can also generate a gene perturbation prediction without fine-tuning the model with perturbation scRNAseq data? or if we can use bulkRNAseq data as a perturbation input?

Thanks and best wishes, Simon

subercui commented 1 year ago

Hi Simon, thank you for your interest.

  1. Currently, you'll need to fine-tune the model since the pre-training task is quite different from the perturbation response prediction. Intuitively, the pre-training used the signal of the same cell, while in the current perturbation prediction setting leans to predict post-perturbation results from a control cell.

  2. Although we didn't test it, I believe it is quite likely to fine-tune the model using bulk sequencing data in a similar fashion. I would be super interested in how it goes, willing to discuss more if you have any further questions

simon-harvEDU commented 1 year ago

Thanks for your detailed answer subercui. How would you pass a bulkRNAseq perturbation? For example we would have like 20 - 30 gene knockout studies and bulkRNAseq on it. Any pointers welcome to give it a shot ... Thanks, Simon

subercui commented 1 year ago

Hi, the provided tutorial is for fine-tuning and then predicting the perturbation response for unseen conditions. Is this still what you want? or anything else? Just want to confirm in advance, I think you may have other applications for your bulk data

simon-harvEDU commented 1 year ago

Hi, exactly what I want. Challenge is that we dont have scRNAseq perturb data. We are asking us if bulkRNAseq is possible to pass. The question is how. One in our team said that it would not be possible because basically we just have a few vectors with one gene perturbed measured on RNAseq on all other genes. lets say: geneA KO - 20000gene expression measurement geneB KO - 20000gene expression measurement ... geneK KO - 20000gene expression measurement

total lets say 15 x geneKO measurement for 20000 genes each.

Would that be possible to fine-tune the model? All measurements are the same cell-type. Thanks and best wishes, Simon

inhyeoklee commented 6 months ago

Hello @simon-harvEDU ,

Have you tried optimizing the scGPT workflow for your bulk data? I'm planning on fine-tuning the model similarly but I wanted to hear first if people had success in doing so.

Thanks, Daniel