dptech-corp / Uni-Fold

An open-source platform for developing protein models beyond AlphaFold.
https://doi.org/10.1101/2022.08.04.502811
Apache License 2.0
380 stars 74 forks source link

competition multimer analysis -- does chain order matter? #136

Open avilella opened 1 year ago

avilella commented 1 year ago

Hi,

I am running what I call a "competition analysis" where I am inputting 3 chains into Uni-Fold, one of them is an antigen, e.g. PD-1, the other is the antigen ligand or cofactor, e.g. PD-L1, and the third is an antibody Fv (with a (GGGGSx4) linker).

Knowing that there are antibodies that should block the interaction between PD-1 and PD-L1, I noticed that so far all the predictions show PD-1 interacting with PD-L1 (chain A and chain B) in the same way as the crystal structure for the PD-1/PD-L1 complex, and then the antibody Fv (chain C) is just binding in the wrong place with regards to the experimental data we have about it.

Does the order of the chains matter to Uni-Fold multimer? If expecting an interacting where chain A + chain B are competing with the interaction of chain A + chain C, does it matter which chains are given first in the input fasta file?

Is there a way to "jolt" the prediction step so that it can leave a local maxima and reattempt the 3 chain prediction iteration? Which parameter would that be? thanks.

ZiyaoLi commented 1 year ago

Thank you for the detailed feedback. With regard to you questions:

  1. chain orders do not matter.
  2. currently there is no off-the-shelf method to predict the 3 chain prediction. You may try adding customized templates to encourage new predictions, but the results can be deceptive.

I think a possible way is to compare the binding affinity between PD-1/PD-L1 and PD-1/Fv. You may try folding PD-1+PD-L1 and folding PD-1+Fv separately and analyze the output confidence scores (especially PAEs).

To compare the binding affinity of different Fvs you may repeat PD-1/Fv complex predictions and looking for best ones. But, as empirical evidences show, the prediction of Ab-Ag complexes in both UF and AF2 are often not accurate enough.

avilella commented 1 year ago

Thanks for the quick reply. We have data where we have experimentally confirmed the epitope+paratope of the PD-1/Fv complex, so we could use this as training for fine-tuning Uni-Fold. We have maybe 1,000 different Fvs. Would that be enough for a successful fine-tuning of Uni-Fold for the specialist topic of Ab-Ag complexes? Thx in advance.

ZiyaoLi commented 1 year ago
Sounds like a solid plan for me. But I would suggest you to use small learning rates, or mix your data with other pdb data to avoid overfitting. 李子尧 博士研究生

北京大学大数据科学研究中心 @. 北京市海淀区颐和园路5号 | ---- Replied Message ---- | From | Albert @.> | | Date | 10/12/2023 18:37 | | To | dptech-corp/Uni-Fold @.> | | Cc | Ziyao @.>, Comment @.***> | | Subject | Re: [dptech-corp/Uni-Fold] competition multimer analysis -- does chain order matter? (Issue #136) |

Thanks for the quick reply. We have data where we have experimentally confirmed the epitope+paratope of the PD-1/Fv complex, so we could use this as training for fine-tuning Uni-Fold. We have maybe 1,000 different Fvs. Would that be enough for a successful fine-tuning of Uni-Fold for the specialist topic of Ab-Ag complexes? Thx in advance.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

avilella commented 1 year ago

Thanks. Would you have an example set of input files as part of the documentation in the Uni-Fold repo?

E.g. 100 pdb files, where chain A is the antigen and chain B is the Antibody with a heavy+linker+light, and then a score for each of those 100 files, say from 1 to 10.

How would we feed that into Uni-Fold for fine-tuning using the finetuning bash script?

Thanks in advance.

On Thu, Oct 12, 2023 at 3:31 PM Ziyao Li @.***> wrote:

Sounds like a solid plan for me. But I would suggest you to use small learning rates, or mix your data with other pdb data to avoid overfitting. 李子尧 博士研究生

北京大学大数据科学研究中心 @. 北京市海淀区颐和园路5号 | ---- Replied Message ---- | From | Albert @.> | | Date | 10/12/2023 18:37 | | To | dptech-corp/Uni-Fold @.> | | Cc | Ziyao @.>, Comment @.***> | | Subject | Re: [dptech-corp/Uni-Fold] competition multimer analysis -- does chain order matter? (Issue #136) |

Thanks for the quick reply. We have data where we have experimentally confirmed the epitope+paratope of the PD-1/Fv complex, so we could use this as training for fine-tuning Uni-Fold. We have maybe 1,000 different Fvs. Would that be enough for a successful fine-tuning of Uni-Fold for the specialist topic of Ab-Ag complexes? Thx in advance.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/dptech-corp/Uni-Fold/issues/136#issuecomment-1759729096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSNZPEQQ2VXMIILEWX6DX675KLANCNFSM6AAAAAA55KWT5I . You are receiving this because you authored the thread.Message ID: @.***>

ZiyaoLi commented 1 year ago

You’ll have to process them into feature.pkl data. Example data are in example_data folder.

李子尧 博士研究生

北京大学大数据科学研究中心 @. 北京市海淀区颐和园路5号 | ---- Replied Message ---- | From | Albert @.> | | Date | 10/12/2023 22:35 | | To | dptech-corp/Uni-Fold @.> | | Cc | Ziyao @.>, Comment @.***> | | Subject | Re: [dptech-corp/Uni-Fold] competition multimer analysis -- does chain order matter? (Issue #136) |

Thanks. Would you have an example set of input files as part of the documentation in the Uni-Fold repo?

E.g. 100 pdb files, where chain A is the antigen and chain B is the Antibody with a heavy+linker+light, and then a score for each of those 100 files, say from 1 to 10.

How would we feed that into Uni-Fold for fine-tuning using the finetuning bash script?

Thanks in advance.

On Thu, Oct 12, 2023 at 3:31 PM Ziyao Li @.***> wrote:

Sounds like a solid plan for me. But I would suggest you to use small learning rates, or mix your data with other pdb data to avoid overfitting. 李子尧 博士研究生

北京大学大数据科学研究中心 @. 北京市海淀区颐和园路5号 | ---- Replied Message ---- | From | Albert @.> | | Date | 10/12/2023 18:37 | | To | dptech-corp/Uni-Fold @.> | | Cc | Ziyao @.>, Comment @.***> | | Subject | Re: [dptech-corp/Uni-Fold] competition multimer analysis -- does chain order matter? (Issue #136) |

Thanks for the quick reply. We have data where we have experimentally confirmed the epitope+paratope of the PD-1/Fv complex, so we could use this as training for fine-tuning Uni-Fold. We have maybe 1,000 different Fvs. Would that be enough for a successful fine-tuning of Uni-Fold for the specialist topic of Ab-Ag complexes? Thx in advance.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/dptech-corp/Uni-Fold/issues/136#issuecomment-1759729096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSNZPEQQ2VXMIILEWX6DX675KLANCNFSM6AAAAAA55KWT5I . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

avilella commented 1 year ago

Hi, I've tried a different competition analysis with 3 components: protein A (blue), protein B (green) and Fv (red).

The red Fv is experimentally predicted to bind protein A, but not protein B. What I get from Uni-Fold is the red Fv very far away from the other two proteins (see image). image

I've tried increasing the max_recycling_iters to 40, but it still does the same. In fact, it seems to bring protein A and protein B closer to each other when increasing this value:

image

python unifold/inference.py --max_recycling_iters=40 --model_name=multimer_ft --param_path=/home/user/Uni-Fold/multimer.unifold.pt --data_dir=/bfx_share1/quick_share/alphafold2/outputs/61/61c491c19019194ecaf5a3db232be305.LRI010.ufld --target_name=61c491c19019194ecaf5a3db232be305.LRI010.mmer --output_dir=/bfx_share1/quick_share/alphafold2/outputs/61/61c491c19019194ecaf5a3db232be305.LRI010.test

Is there any other parameter I could play with? Thanks in advance.

ZiyaoLi commented 1 year ago

From your result I believe the model try to form a pseudo homodimer between A and B while ignoring Fv. I would suggest you to try AC and BC, and then compare the confidence scores.

---- Replied Message ---- | From | Albert @.> | | Date | 10/31/2023 21:04 | | To | dptech-corp/Uni-Fold @.> | | Cc | Ziyao @.>, Comment @.> | | Subject | Re: [dptech-corp/Uni-Fold] competition multimer analysis -- does chain order matter? (Issue #136) |

Hi, I've tried a different competition analysis with 3 components: protein A (blue), protein B (green) and Fv (red).

The red Fv is experimentally predicted to bind protein A, but not protein B. What I get from Uni-Fold is the red Fv very far away from the other two proteins (see image).

I've tried increasing the max_recycling_iters to 40, but it still does the same. In fact, it seems to bring protein A and protein B closer to each other when increasing this value:

python unifold/inference.py --max_recycling_iters=40 --model_name=multimer_ft --param_path=/home/user/Uni-Fold/multimer.unifold.pt --data_dir=/bfx_share1/quick_share/alphafold2/outputs/61/61c491c19019194ecaf5a3db232be305.LRI010.ufld --target_name=61c491c19019194ecaf5a3db232be305.LRI010.mmer --output_dir=/bfx_share1/quick_share/alphafold2/outputs/61/61c491c19019194ecaf5a3db232be305.LRI010.test

Is there any other parameter I could play with? Thanks in advance.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>