Closed sonyahanson closed 8 years ago
The loop-modeled-template created by @danielparton in the full TK ensembler run for the 3W8Q pdb (top), looks almost identical (AKA not good). (That model can be found at /cbio/jclab/projects/parton/kinome-ensembler/templates/structures-modeled-loops/MP2K1_HUMAN_D0_3W8Q_A-pdbfixed.pdb
on the cluster.)
A similar problem can be seen in the 2OIQ (Src, imatinib bound structure) template after loop modeling, which is probably why any models onto this template have been failing in the refinement steps:
Yikes!
ensembler loopmodel
is just the ROSETTA loop modeling wrapper, I believe. I wonder if we can do a bit more debugging on what is going on there.
This might also be a great example to pass on to Mike Schnieders as a case where existing loop modeling schemes do not do well.
I'm hoping there's an easier way to get usable templates than the Mike Schnieders way, and 2OIQ
was already suggested to them as a good case.
I'm surprised Rosetta is giving these results. This is worse than Modeller was doing.
Here is where the loop modeling input commands are kept: https://github.com/choderalab/ensembler/blob/master/ensembler/modeling.py#L288-L301
I imagine we could try the following:
Do you think that will work?
Here are the loop modeling options for ROSETTA 3.1. Not sure if we're using a more recent version.
The ROSETTA output file might be a good place to look for clues. It could be that we need to increment -loops:grow_attempts
, for example.
Yeah. Would be good to just run few a through and see if they improve the situation.
I think I might also check with the Abba Leffler (from the Bonneau lab), who had some recommendation on the best ways to use Rosetta for loop modeling last time I talked to him (/cbio/jclab/share/rosetta_2014.35.57232_bundle/main/source/bin
). We seem to be using the same Rosetta version as he does, but maybe not the same actual loop model command...
I think I might also check with the Abba Leffler (from the Bonneau lab)
That's also a great idea! He might also be able to better interpret the ROSETTA output files to diagnose our issue.
I just briefly looked through one of the ROSETTA log files you mention (for 3W8Q) where residues 211-238 are modeled in (28 residues):
/cbio/jclab/projects/parton/kinome-ensembler/templates/structures-modeled-loops/MP2K1_HUMAN_D0_3W8Q_A-loopmodel-log.yaml
and noticed a few sketchy warnings. I am not sure if these are indeed problematic. We'll want an expert to look at these:
core.pose.util: Cannot open psipred_ss2 file
protocols.loops.loops_main: can not open DSSP file
[ WARNING ] missing an atom: 223 OVU1 that depends on a nonexistent polymer connection!
I don't see any other failures, so it may simply be that the loop refinement isn't aggressive enough.
The current ROSETTA invocation we are using is:
/cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/source/bin/loopmodel.default.linuxgccrelease
-database /cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/database
-in::file::s /scratch/2758588.mskcc-fe1.local/tmpi5ps9C/template.pdb -loops:loop_file
/scratch/2758588.mskcc-fe1.local/tmpi5ps9C/template.loop -out:path:all /scratch/2758588.mskcc-fe1.local/tmpi5ps9C
-loops:remodel perturb_kic -loops:refine refine_kic -ex1 -ex2 -nstruct 1 -loops:max_kic_build_attempts
100 -in:file:fullatom -overwrite
I think we may be missing some arguments that ensure that only the loops are modified here. I believe that normally other sidechains can be repacked.
For example there is this option we may want to turn on:
-loops:optimize_only_kic_
region_sidechains_after_move Should rotamer trials and minimization be performed after every KIC move but only within the
loops:neighbor_dist of the residues in the moved KIC segment. Speeds up execution when using very
large loop definitions (such as when whole chains are used for ensemble generation).
default = 'false'. [Boolean]
Thanks!
Currently trying to debug this using:
/cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/source/bin/loopmodel.default.linuxgccrelease -database /cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/database -in::file::s MP2K1_HUMAN_D0_3W8Q_A.pdb -loops:loop_file MP2K1_HUMAN_D0_3W8Q_A.loop -loops:remodel perturb_kic -loops:refine refine_kic -ex1 -ex2 -nstruct 1 -loops:max_kic_build_attempts 100 -in:file:fullatom -overwrite
With the .pdb and .loop files copied from /cbio/jclab/projects/parton/kinome-ensembler/templates/structures-modeled-loops/
.
The command line used above misses something. It didn't actually model the loop, it just modeled the sequence that was already in the pdb, though the end structure looks much better then the models shown above... In ensembler loopmodel
, it does actually model the loop with the correct sequence, just does it terribly. Need to figure out what's missing...
Also, for some reason, ensembler loopmodel
isn't working for me at all in an interactive session on the cluster. This is the error I get:
Modeling missing loops for template MP2K1_HUMAN_3W8Q_A /cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/core.py:587: UserWarning: loopmodel debug version (loopmodel.default.linuxgccdebug) will be ignored, as it runs extremely slowly 'loopmodel debug version ({0}) will be ignored, as it runs extremely slowly'.format(filename) /cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/core.py:587: UserWarning: loopmodel debug version (loopmodel.linuxgccdebug) will be ignored, as it runs extremely slowly 'loopmodel debug version ({0}) will be ignored, as it runs extremely slowly'.format(filename) Traceback (most recent call last): File "/cbio/jclab/home/hansons/opt/anaconda/bin/ensembler", line 6, insys.exit(main()) File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/cli.py", line 40, in main command.dispatch(args) File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/cli_commands/loopmodel.py", line 52, in dispatch loglevel=loglevel File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/utils.py", line 37, in print_done fn(*args, **kwargs) File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 74, in model_template_loops loopmodel_templates(templates_resolved_seq, missing_residues_list, process_only_these_templates=process_only_these_templates, overwrite_structures=overwrite_structures) File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 221, in loopmodel_templates loopmodel_template(template, missing_residues[template_index], overwrite_structures=overwrite_structures) File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 239, in loopmodel_template loopmodel_output = run_loopmodel(template_filepath, loop_filepath, output_pdb_filepath, output_score_filepath) File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 324, in run_loopmodel return LoopmodelOutput(output_text=output_text, exception=e, trbk=traceback.format_exc(), successful=False) UnboundLocalError: local variable 'output_text' referenced before assignment
It works fine (though producing a similar 3W8Q model as the first picture above) when I'm using mpirun and the batch queue system, using the same resources (#PBS -l procs=1,mem=4gb,vmem=4gb,pmem=4gb
). The run script I'm using is similar to this, though modified for only doing one model as above.
Looks like that debug output comes from here, where it searches your $PATH
and, if it encounters a filename that maches loopmodel.*debug
, emits that warning.
The code then fails here, where it had tried to run loopmodel but failed trying to do so, and yet still tries to use the output it didn't get in LoopmodelOutput
. This is a silly error handler, since it should at least try to output the reason for the error before trying to continue, and protect the failures in case it doesn't work. We should fix that.
In any case, it seems to be a $PATH
problem. Your $PATH
must be different for interactive vs batch jobs; in interactive, it must have found a loopmodel.*debug
file somewhere and given up.
That makes sense.
Sooooo....
Pretty convinced the ...-pdbfixed.pdb
files are actually just the pdbfixed files and not the Rosetta loop-modeled files, and I've been looking at irrelevant files the whole time.
This totally makes sense, and now I feel a little silly: the MP2K1_HUMAN_D0_3W8Q_A.pdb
files in templates/structures-modeled-loops
directory are the loop modeled files:
...
Hurray! No changes to be made. Probably an indication that we should not use pdbfixer for loop modeling for yank, though.
Apologies for the false alarm.
Kept this open, since I was checking out the other problem that's mentioned:
It turns out that even on batch jobs I was getting the local variable 'output_text' referenced before assignment
error, which is whyapparently even since the dansu-dansu
models I have not been successfully using ensembler loopmodel
, which I didn't realize because I was just checking if new *.pdb files were being created, which they were, but these were the intermediate ...-pdbfixer.pdb
files.
This seems to be working now, after making sure to add the MINIROSETTA_DATABASE
to my path.
This means in total to run ensembler loopmodel
two things need to be added to your path (The first, I had already added.):
export PATH=/cbio/jclab/share/rosetta_2014.35.57232_bundle/main/source/bin:$PATH
export MINIROSETTA_DATABASE=/cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/database
Should we make this requirement more clear in the docs?
Yes please!
Seeing some strange output
structures-modeled-loops/...pdbfixed.pdb
files resulting fromensembler loopmodel
, this pdb can be found here:Not sure if this might be a pymol representation thing that would be resolved after minimization, but I have a feeling there is a real problem here.
This also happened, but turns out these were not the templates we wanted anyway (these pdbs can be found here and here: