Closed gundalav closed 3 years ago
As shown in README, for example, an input sequence FASTA file 3kud_complex.fasta
>3KUD_complex
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH:
PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFL
Before the ":" symbol is the Chain A of PDB ID:3KUD, Ras, and after it is the Chain B, Raf. This input file will give the Ras-Raf complex.
>myinput.fasta
AC/DE:FGH
And last question, DE:FGH simply means we are docking DE to FGH, am I right?
For example, sequence AC/DE:FGH
will be modeled as polypeptides: AC
, DE
and FGH
. A seperate MSA will be generated for ACDE
and FGH
. If pair_msa
is enabled, ACDE
's MSA will be paired with FGH
's MSA.
>example.fasta
PIAQ/IHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGE/LASK
this gives a structure like this:
The N-terminal 4 residues and the C-terminal 4 residues are separately modeled. The N-terminal one connects to the middle (main) structure, but the C-terminal one was not connected to it. This trim feature will be effective for prediction of a large protein (or homooligomer). However, I have never utilized this feature before, so I don't know the details. If anyone knows how to use it effectively, please let me know.
Hi Yoshitaka-san, Thank you for your reply. In your example
>3KUD_complex MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQH: PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAASLIGEELQVDFL
Before the ":" symbol is the Chain A of PDB ID:3KUD, Ras, and after it is the Chain B, Raf. This input file will give the Ras-Raf complex.
Is it correct to say that AlphaFold2_advanced is docking Ras against Raf?
If anyone knows how to use it effectively, please let me know.
I think trim feature is useful for scoring pLDDT and tol of the models with disorder loop. Trimming of low pLDDT region makes the rank of pLDDT will be more accurate. But It might be better that the rescoring average pLDDT performed on another program after the AF2 calculation.
If the model possess disorder loop, tol was not decreased after several recycles.
Trimming of disorder loop might be effective to judge the endpoint of recycling.
However, I have never use it.
@gundalav Strictly speaking, AlphaFold2(_advanced) does not explicitly dock the two (or more) given proteins. Instead It predicts the "most plausible structure" from the whole input sequences, even though the input consists of multiple chains. Because AF2 doesn't use any energy-based or physics-based methods to build a structure except the final relaxation, it doesn't attempt to dock one protein to another. However, in some cases, it can predict a correct complex structure as a consequence, probably by detecting co-evolved amino acids of the two proteins' sequences. Looking at the final output only, the results are the same as ones from docking simulations.
P. Bryant et al. said, "We find that the results in terms of successful docking using AF2 are superior to all other docking methods. " in their paper.
Note that this feature was unintended and unnoticed by the AlphaFold2 development team in July, and DeepMind published "AlphaFold-Multimer" paper on Oct 4 (See https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1.full.pdf ). They will release the AF2-Multimer version in the near future.
Yoshitaka-san, Thank you for the explanation. Very convincing G.V.
Hi Yoshitaka-san,
Thank you for the great work.
Original AlphaFold2_advanced allows you to do inter protein chain-breaks in the sequence entry:
So you can do something like:
AC/DE:FGH
How can we do this with your script? Because it only takes one file as fasta file.
Do you create a fasta file like:
And last question,
DE:FGH
simply means we are docking DE to FGH, am I right?Thanks and hope to hear from you again.
G.V.