How to do inter/intra protein chain-breaks through command line interface ?

gundalav commented 3 years ago

Hi Yoshitaka-san,

Thank you for the great work.

Original AlphaFold2_advanced allows you to do inter protein chain-breaks in the sequence entry:

So you can do something like: AC/DE:FGH

Use / to specify intra-protein chainbreaks (for trimming regions within protein).
Use : to specify inter-protein chainbreaks (for modeling protein-protein hetero-complexes).

How can we do this with your script? Because it only takes one file as fasta file.

Do you create a fasta file like:

 >myinput.fasta
 AC/DE:FGH

And last question, DE:FGH simply means we are docking DE to FGH, am I right?

Thanks and hope to hear from you again.

G.V.

YoshitakaMo commented 3 years ago

As shown in README, for example, an input sequence FASTA file 3kud_complex.fasta

>3KUD_complex
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH:
PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFL

Before the ":" symbol is the Chain A of PDB ID:3KUD, Ras, and after it is the Chain B, Raf. This input file will give the Ras-Raf complex.

 >myinput.fasta
 AC/DE:FGH

And last question, DE:FGH simply means we are docking DE to FGH, am I right?

For example, sequence AC/DE:FGH will be modeled as polypeptides: AC, DE and FGH. A seperate MSA will be generated for ACDE and FGH. If pair_msa is enabled, ACDE's MSA will be paired with FGH's MSA.

>example.fasta
PIAQ/IHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGE/LASK

this gives a structure like this:

The N-terminal 4 residues and the C-terminal 4 residues are separately modeled. The N-terminal one connects to the middle (main) structure, but the C-terminal one was not connected to it. This trim feature will be effective for prediction of a large protein (or homooligomer). However, I have never utilized this feature before, so I don't know the details. If anyone knows how to use it effectively, please let me know.

gundalav commented 3 years ago

Hi Yoshitaka-san, Thank you for your reply. In your example

>3KUD_complex
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH:
PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFL
Before the ":" symbol is the Chain A of PDB ID:3KUD, Ras, and after it is the Chain B, Raf. This input file will give the Ras-Raf complex.

Is it correct to say that AlphaFold2_advanced is docking Ras against Raf?

CYP152N1 commented 3 years ago

If anyone knows how to use it effectively, please let me know.

I think trim feature is useful for scoring pLDDT and tol of the models with disorder loop. Trimming of low pLDDT region makes the rank of pLDDT will be more accurate. But It might be better that the rescoring average pLDDT performed on another program after the AF2 calculation.

If the model possess disorder loop, tol was not decreased after several recycles.
Trimming of disorder loop might be effective to judge the endpoint of recycling. However, I have never use it.

YoshitakaMo commented 3 years ago

@gundalav Strictly speaking, AlphaFold2(_advanced) does not explicitly dock the two (or more) given proteins. Instead It predicts the "most plausible structure" from the whole input sequences, even though the input consists of multiple chains. Because AF2 doesn't use any energy-based or physics-based methods to build a structure except the final relaxation, it doesn't attempt to dock one protein to another. However, in some cases, it can predict a correct complex structure as a consequence, probably by detecting co-evolved amino acids of the two proteins' sequences. Looking at the final output only, the results are the same as ones from docking simulations.

P. Bryant et al. said, "We find that the results in terms of successful docking using AF2 are superior to all other docking methods. " in their paper.

Note that this feature was unintended and unnoticed by the AlphaFold2 development team in July, and DeepMind published "AlphaFold-Multimer" paper on Oct 4 (See https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1.full.pdf ). They will release the AF2-Multimer version in the near future.

gundalav commented 3 years ago

Yoshitaka-san, Thank you for the explanation. Very convincing G.V.

YoshitakaMo / localcolabfold

How to do inter/intra protein chain-breaks through command line interface ? #23