choderalab / ensembler

Automated omics-scale protein modeling and simulation setup.
http://ensembler.readthedocs.io/
GNU General Public License v2.0
52 stars 21 forks source link

Strangely modeled loops in templates after loop modeling. #61

Closed sonyahanson closed 8 years ago

sonyahanson commented 8 years ago

Seeing some strange output structures-modeled-loops/...pdbfixed.pdb files resulting from ensembler loopmodel, this pdb can be found here:

strange-loop-sticks strange-loop-cartoon

Not sure if this might be a pymol representation thing that would be resolved after minimization, but I have a feeling there is a real problem here.

This also happened, but turns out these were not the templates we wanted anyway (these pdbs can be found here and here:

howthefuck

sonyahanson commented 8 years ago

The loop-modeled-template created by @danielparton in the full TK ensembler run for the 3W8Q pdb (top), looks almost identical (AKA not good). (That model can be found at /cbio/jclab/projects/parton/kinome-ensembler/templates/structures-modeled-loops/MP2K1_HUMAN_D0_3W8Q_A-pdbfixed.pdb on the cluster.)

A similar problem can be seen in the 2OIQ (Src, imatinib bound structure) template after loop modeling, which is probably why any models onto this template have been failing in the refinement steps: 2oiq_loop

jchodera commented 8 years ago

Yikes!

ensembler loopmodel is just the ROSETTA loop modeling wrapper, I believe. I wonder if we can do a bit more debugging on what is going on there.

This might also be a great example to pass on to Mike Schnieders as a case where existing loop modeling schemes do not do well.

sonyahanson commented 8 years ago

I'm hoping there's an easier way to get usable templates than the Mike Schnieders way, and 2OIQ was already suggested to them as a good case.

I'm surprised Rosetta is giving these results. This is worse than Modeller was doing.

jchodera commented 8 years ago

Here is where the loop modeling input commands are kept: https://github.com/choderalab/ensembler/blob/master/ensembler/modeling.py#L288-L301

I imagine we could try the following:

Do you think that will work?

jchodera commented 8 years ago

Here are the loop modeling options for ROSETTA 3.1. Not sure if we're using a more recent version.

jchodera commented 8 years ago

The ROSETTA output file might be a good place to look for clues. It could be that we need to increment -loops:grow_attempts, for example.

sonyahanson commented 8 years ago

Yeah. Would be good to just run few a through and see if they improve the situation.

I think I might also check with the Abba Leffler (from the Bonneau lab), who had some recommendation on the best ways to use Rosetta for loop modeling last time I talked to him (/cbio/jclab/share/rosetta_2014.35.57232_bundle/main/source/bin). We seem to be using the same Rosetta version as he does, but maybe not the same actual loop model command...

jchodera commented 8 years ago

I think I might also check with the Abba Leffler (from the Bonneau lab)

That's also a great idea! He might also be able to better interpret the ROSETTA output files to diagnose our issue.

jchodera commented 8 years ago

I just briefly looked through one of the ROSETTA log files you mention (for 3W8Q) where residues 211-238 are modeled in (28 residues):

/cbio/jclab/projects/parton/kinome-ensembler/templates/structures-modeled-loops/MP2K1_HUMAN_D0_3W8Q_A-loopmodel-log.yaml

and noticed a few sketchy warnings. I am not sure if these are indeed problematic. We'll want an expert to look at these:

core.pose.util: Cannot open psipred_ss2 file
protocols.loops.loops_main: can not open DSSP file
[ WARNING ] missing an atom: 223 OVU1 that depends on a nonexistent polymer connection!

I don't see any other failures, so it may simply be that the loop refinement isn't aggressive enough.

jchodera commented 8 years ago

The current ROSETTA invocation we are using is:

/cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/source/bin/loopmodel.default.linuxgccrelease
  -database /cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/database
  -in::file::s /scratch/2758588.mskcc-fe1.local/tmpi5ps9C/template.pdb -loops:loop_file
  /scratch/2758588.mskcc-fe1.local/tmpi5ps9C/template.loop -out:path:all /scratch/2758588.mskcc-fe1.local/tmpi5ps9C
  -loops:remodel perturb_kic -loops:refine refine_kic -ex1 -ex2 -nstruct 1 -loops:max_kic_build_attempts
  100 -in:file:fullatom -overwrite
jchodera commented 8 years ago

I think we may be missing some arguments that ensure that only the loops are modified here. I believe that normally other sidechains can be repacked.

For example there is this option we may want to turn on:

-loops:optimize_only_kic_
region_sidechains_after_move    Should rotamer trials and minimization be performed after every KIC move but only within the
                                loops:neighbor_dist of the residues in the moved KIC segment. Speeds up execution when using very
                                large loop definitions (such as when whole chains are used for ensemble generation).
                                default = 'false'. [Boolean]

The manual for Rosetta 3.4.

sonyahanson commented 8 years ago

Thanks!

sonyahanson commented 8 years ago

Currently trying to debug this using:

/cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/source/bin/loopmodel.default.linuxgccrelease  
-database /cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/database  -in::file::s MP2K1_HUMAN_D0_3W8Q_A.pdb 
-loops:loop_file  MP2K1_HUMAN_D0_3W8Q_A.loop  -loops:remodel perturb_kic 
-loops:refine refine_kic -ex1 -ex2 -nstruct 1 -loops:max_kic_build_attempts  100 -in:file:fullatom -overwrite

With the .pdb and .loop files copied from /cbio/jclab/projects/parton/kinome-ensembler/templates/structures-modeled-loops/.

sonyahanson commented 8 years ago

The command line used above misses something. It didn't actually model the loop, it just modeled the sequence that was already in the pdb, though the end structure looks much better then the models shown above... In ensembler loopmodel, it does actually model the loop with the correct sequence, just does it terribly. Need to figure out what's missing...

sonyahanson commented 8 years ago

Also, for some reason, ensembler loopmodel isn't working for me at all in an interactive session on the cluster. This is the error I get:

Modeling missing loops for template MP2K1_HUMAN_3W8Q_A
/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/core.py:587: UserWarning: loopmodel debug version (loopmodel.default.linuxgccdebug) will be ignored, as it runs extremely slowly
  'loopmodel debug version ({0}) will be ignored, as it runs extremely slowly'.format(filename)
/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/core.py:587: UserWarning: loopmodel debug version (loopmodel.linuxgccdebug) will be ignored, as it runs extremely slowly
  'loopmodel debug version ({0}) will be ignored, as it runs extremely slowly'.format(filename)
Traceback (most recent call last):
  File "/cbio/jclab/home/hansons/opt/anaconda/bin/ensembler", line 6, in 
    sys.exit(main())
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/cli.py", line 40, in main
    command.dispatch(args)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/cli_commands/loopmodel.py", line 52, in dispatch
    loglevel=loglevel
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/utils.py", line 37, in print_done
    fn(*args, **kwargs)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 74, in model_template_loops
    loopmodel_templates(templates_resolved_seq, missing_residues_list, process_only_these_templates=process_only_these_templates, overwrite_structures=overwrite_structures)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 221, in loopmodel_templates
    loopmodel_template(template, missing_residues[template_index], overwrite_structures=overwrite_structures)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 239, in loopmodel_template
    loopmodel_output = run_loopmodel(template_filepath, loop_filepath, output_pdb_filepath, output_score_filepath)
  File "/cbio/jclab/home/hansons/opt/anaconda/lib/python2.7/site-packages/ensembler/modeling.py", line 324, in run_loopmodel
    return LoopmodelOutput(output_text=output_text, exception=e, trbk=traceback.format_exc(), successful=False)
UnboundLocalError: local variable 'output_text' referenced before assignment

It works fine (though producing a similar 3W8Q model as the first picture above) when I'm using mpirun and the batch queue system, using the same resources (#PBS -l procs=1,mem=4gb,vmem=4gb,pmem=4gb). The run script I'm using is similar to this, though modified for only doing one model as above.

jchodera commented 8 years ago

Looks like that debug output comes from here, where it searches your $PATH and, if it encounters a filename that maches loopmodel.*debug, emits that warning.

The code then fails here, where it had tried to run loopmodel but failed trying to do so, and yet still tries to use the output it didn't get in LoopmodelOutput. This is a silly error handler, since it should at least try to output the reason for the error before trying to continue, and protect the failures in case it doesn't work. We should fix that.

In any case, it seems to be a $PATH problem. Your $PATH must be different for interactive vs batch jobs; in interactive, it must have found a loopmodel.*debug file somewhere and given up.

sonyahanson commented 8 years ago

That makes sense.

sonyahanson commented 8 years ago

Sooooo....

Pretty convinced the ...-pdbfixed.pdb files are actually just the pdbfixed files and not the Rosetta loop-modeled files, and I've been looking at irrelevant files the whole time.

This totally makes sense, and now I feel a little silly: the MP2K1_HUMAN_D0_3W8Q_A.pdb files in templates/structures-modeled-loops directory are the loop modeled files:

38wq-both

...

Hurray! No changes to be made. Probably an indication that we should not use pdbfixer for loop modeling for yank, though.

Apologies for the false alarm.

sonyahanson commented 8 years ago

Kept this open, since I was checking out the other problem that's mentioned:

It turns out that even on batch jobs I was getting the local variable 'output_text' referenced before assignment error, which is whyapparently even since the dansu-dansu models I have not been successfully using ensembler loopmodel, which I didn't realize because I was just checking if new *.pdb files were being created, which they were, but these were the intermediate ...-pdbfixer.pdb files.

This seems to be working now, after making sure to add the MINIROSETTA_DATABASE to my path.

This means in total to run ensembler loopmodel two things need to be added to your path (The first, I had already added.):

export PATH=/cbio/jclab/share/rosetta_2014.35.57232_bundle/main/source/bin:$PATH
export MINIROSETTA_DATABASE=/cbio/jclab/home/parton/opt/rosetta_2014.35.57232_bundle/main/database

Should we make this requirement more clear in the docs?

jchodera commented 8 years ago

Yes please!