RosettaCommons / RoseTTAFold

This package contains deep learning models and related scripts for RoseTTAFold
MIT License
2.04k stars 440 forks source link

Update predict_complex README #53

Closed neilfleckSCRI closed 3 years ago

neilfleckSCRI commented 3 years ago

When I first ran the complex structure prediction (step 4), it was outputting a model where the break between the two subunits was in the totally wrong place.

After a lot of digging through the predict_complex.py file, I realized that the numbers after the "-Ls" argument needed to be changed to represent the lengths of the individual protein subunits being modeled. I finally figured this out when I saw that the lengths of the two example subunits are 218 aa and 310 aa. I imagine other people will run into this same issue until it's clarified in the README.

momodesuka commented 3 years ago

@neilfleckSCRI Thank you!!! Now I understand the mean of -Ls. Could you tell me how to generate the "paired.a3m" for filter? Because I want to study the interaction of two subunits. Many thanks

neilfleckSCRI commented 3 years ago

@neilfleckSCRI Thank you!!! Now I understand the mean of -Ls. Could you tell me how to generate the "paired.a3m" for filter? Because I want to study the interaction of two subunits. Many thanks

First make the individual .a3m files by using the main pipeline on the individual proteins (as described in the main README). Technically you only need to perform the run_pyrosetta_ver.sh script up to the end of the "1. generate MSAs" section (line 38). You can do this by deleting everything in the run_pyrosetta_ver.sh script file after line 38 and run that truncated script. Or you can just run the whole pipeline, but that'll take longer and do much more than just make .a3m files.

Then for both the proteins that were just analyzed, you should have files named something like t000_x.msa0.a3m. Take those and rename them to something unique like protein1.a3m and protein2.a3m. Then run the "make_joint_MSA_bacterial.py" script on those two proteins. This would be done by running a command such as python make_joint_MSA_bacterial.py protein1.a3m protein2.a3m

That particular step apparently only works for bacterial proteins. If you're doing eukaryotic proteins, I have no clue what needs to be done.

Anyways, that'll spit out a paired.a3m file, which you run via the predict_complex.py script, as described on step 4. Pay attention to the order that you inputted the proteins into the make_joint_MSA_bacterial.py, because you'll need to put the lengths after the -Ls argument in the same order. I.e., you need to do:

python network/predict_complex.py -i PE_dimercomplex/filtered.a3m -o PE_dimercomplex/complex -Ls [length of protein1] [length of protein2]

If you swap the order of protein1 and protein2 it'll give bad results.

momodesuka commented 3 years ago

Thank you. Or I can not understand the pipeline without your careful instruction. The eukaryotic proteins is really a problem, because the README file say less thing about that. Any way, I will try make_joint_MSA_bacterial.py to deal with eukaryotic protein.