choderalab / ensembler

Automated omics-scale protein modeling and simulation setup.
http://ensembler.readthedocs.io/
GNU General Public License v2.0
52 stars 21 forks source link

Ensembler usage for missing parts in crystal structure #53

Open mihirdate opened 9 years ago

mihirdate commented 9 years ago

Hi Danny and Ensembler team, Using ensembler, if I want to only model missing parts of the protein crystal structure I have, what are my options?

    ensembler quickmodel --target_uniprot_entry_name my_protein --uniprot_domain_regex '^my protein' --template_pdbids ./my_pdb.pdb --no-loopmodel
danielparton commented 9 years ago

Hi Mihir,

Do you mean you only want to run the modeling stage of Ensembler, not the MD refinement steps? If so, you should use the main pipeline functions up to and including the build_models stage, as detailed in the Usage Examples page in the documentation: http://ensembler.readthedocs.org/en/latest/examples.html#example-using-the-main-pipeline-functions

Or do you mean that you have a single template structure, and you want to conserve the resolved structure while modeling in the unresolved parts? If so, Ensembler should do that automatically. You would just need to set up the target sequence (in targets/targets.fa) so that it contains the entire protein sequence. If the desired target sequence is a UniProt protein or domain, then you can use the ensembler gather_targets command to do this, as detailed in the above documentation page. Otherwise you should set up targets/targets.fa manually with the desired sequence.

Best,

Danny

mihirdate commented 9 years ago

Hi Danny I am interested in 2nd case. I have a in-house pdb and I want to conserve resolved structure and model only unresolved parts. I also have full target sequence. In modeling with ensembler, I'd use standard procedure ensembler employes including MD refinement.

So I have target sequence in targets/targets.fa. I also have template structure in templates/structures_resolved/my_pdb.pdb

In this case what would be my command? ensembler build_models? or quick models? Also in using any of it, what would be the values of different options?

Best, Mihir

danielparton commented 9 years ago

So instead of using quickmodel you will want to use the main pipeline functions as described here: http://ensembler.readthedocs.org/en/latest/examples.html#example-using-the-main-pipeline-functions

These are a set of functions which are to be performed in order. But by setting up the targets and templates manually, you will be skipping the gather_targets and gather_templates stages. Note that as well as the template structure, you will also need to have the template sequence defined (resolved residues only) in templates/templates-resolved-seq.fa. The sequence ID should be the same as the file name for the structure (minus the '.pdb'). Once the targets and templates are set up, you will want to perform the following commands:

ensembler loopmodel   # optional; requires installation of Rosetta
ensembler align
ensembler build_models
ensembler cluster
ensembler refine_implicit
ensembler solvate
ensembler refine_explicit

Regarding the first command: if your templates have missing loops, and if you have Rosetta installed, then it would probably make sense to use the loopmodel command, which uses Rosetta loopmodel for ab initio modeling of missing template loops. If this command is not run, then the loops will instead be built by Modeller at the build_models stage. But Rosetta loopmodel typically does a better job of this than Modeller.

In principle, you should not need to set any flags for the above commands. But I would strongly recommend first running each command with the -h flag and reading the documentation of the available flags, in case you want to change anything from the default behavior.

mihirdate commented 9 years ago

Hi Danny, We don't have Rosetta. So I will rely on modeller for loop as well. I have targets/my_target.fa and template/structures-resolved/my_protein.pdb in place. When I run ensembler loopmodel, it asked for structures/sifts directory to be present (and presumably containing pdb and sifts). Now if I want to make a model based on template of my choice, saved on local system not available on UniProt (because it is protected IP), how do I make these required files? When I tried making model of similar protein using quickmodel, based on a X-ray structure from UniProt or RCSB, it created all required files automatically and made models.
So is there a command to create all required files from template (saved on local system)?

$ensembler loopmodel Traceback (most recent call last): File "/opt/az/local/anaconda/2.3.0/installdir/bin/ensembler", line 6, in sys.exit(main()) File "/opt/az/local/anaconda/2.3.0/installdir/lib/python2.7/site-packages/ensembler/cli.py", line 38, in main ensembler.core.check_project_toplevel_dir() File "/opt/az/local/anaconda/2.3.0/installdir/lib/python2.7/site-packages/ensembler/core.py", line 188, in check_project_toplevel_dir raise Exception('Directory {0} is not the top-level directory of an Ensembler project'.format(dirpath)) Exception: Directory structures/sifts is not the top-level directory of an Ensembler project

Also I see the same message for any subsequent command option of ensembler.

danielparton commented 9 years ago

Hi, if you're not using Rosetta loopmodel, then you should just skip the loopmodel command. Modeller will be used when the build_models command is run. But the align command should be run first.

The message Exception: Directory structures/sifts is not the top-level directory of an Ensembler project means you are in the wrong directory when running the Ensembler command. You need to be in the top-level directory for the project you created, e.g. if you created the project at /home/username/ensembler_project then you need to be in that directory when issuing Ensembler commands. If you are in /home/username/ensembler_project/structures/sifts then commands will fail.

mihirdate commented 9 years ago

HI, I think am working in top level directory as /targets/ and /templates/ are in the same directory I am in.

/home/username/*my_path*/ensembler-test/
/home/username/*my_path*/ensembler-test/targets/
/home/username/*my_path*/ensembler-test/templates/

I think that should make

/home/username/*my_path*/ensembler-test/

a top level directory. Am I right?

danielparton commented 9 years ago

Yes, that should be the top-level directory. Let me know if you still get errors.

mihirdate commented 9 years ago

Hi Yes so I ran the command in top-level directory and got following error.

$e/home/username/*my_path*/ensembler-test/nsembler build_models
Traceback (most recent call last):
  File "/opt/az/local/anaconda/2.3.0/installdir/bin/ensembler", line 6, in <module>
    sys.exit(main())
  File "/opt/az/local/anaconda/2.3.0/installdir/lib/python2.7/site-packages/ensembler/cli.py", line 38, in     main
ensembler.core.check_project_toplevel_dir()
  File "/opt/az/local/anaconda/2.3.0/installdir/lib/python2.7/site-packages/ensembler/core.py", line 188, in check_project_toplevel_dir
raise Exception('Directory {0} is not the top-level directory of an Ensembler project'.format(dirpath))
Exception: Directory structures is not the top-level directory of an Ensembler project
danielparton commented 9 years ago

Hi, so the way I wrote that error message is kind of wrong. It actually means that the directory structures, which should have been created when ensembler init was run, has not been found. Did you delete that directory? All of the directories created by ensembler init must be present for the subsequent commands to run.

I'll update the code shortly so that it outputs a more useful error message.