bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.04k stars 205 forks source link

perturbation prediction similar to scGen #95

Open CalmDownTR opened 1 year ago

CalmDownTR commented 1 year ago

Hello, I really enjoy your work. I noticed that your current downstream task for disturbance prediction is related to genetic perturbations. Is there any method to achieve perturbation prediction similar to scGen?

Specifically, I used a training set for fine-tuning, which includes two states: control and simulate. The model can learn how to map a certain cell type from control to simulate. By inputting cells in the control state, I can obtain the expression value of the stimulated state of that cell.

Do you have any relevant suggestions? Do you have plans to launch similar tutorials in the future?

Thanks.

subercui commented 1 year ago

Hi, thank you for the question and interest in our work! Reading through your specific description, I kind of think it is exactly the type of perturbation application we showed in the tutorial.

Could you please further explain the difference? Do you mean you want to include multiple cell types in the training set, and instead of generalize to new perturbations, you want to generalize to new cell types of same perturbations?

I will try to provide more help once having a better understanding of your specific need.

CalmDownTR commented 1 year ago

Hi, thank you very much for your quick reply!

I think the main difference is that in the perturbation prediction fine-tuning of scGPT, one to two genes need to be pre labeled as perturbation genes, but not in scGen. In scGen, control and stimulate are simply labeled at the cellular level without knowing which genetic modification is causing them.

In the data preprocessing stage. Gears needs to annotate the data as follows. Control cells have condition format of ctrl, single preservation has condition format of A+ctrl or ctrl+A, combination preservation has condition format of A+B. But in scGen, the condition only needs to have two states: control and simulate.

I think this is mainly due to the fact that in scGPT, the perturbation is assumed to be caused by manual editing of certain genes, but in my application, I try to predict the changes that cells will undergo under the influence of certain external environments, such as in the tumor microenvironment.

I hope my description can explain my need. If I have some misunderstanding, please let me know.

Thanks again.