Closed msultan closed 8 years ago
This could be due to another change in the SIFTS interface.
@danielparton: Any chance you have a moment to take a quick look at this?
If not, I can take a stab at trying to figure out what changed this weekend.
OK, that's weird. I see that I can get at the URL ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/
just fine, and the constructed URL including 2SRC
should be:
ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/2src.xml.gz
Can you quickly see if your FTP access is blocked on those clusters with something like
wget ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/xml/2src.xml.gz
I was able to get past the SIFTS part on my desktop just now, though it oddly failed at a different point:
mski1776:ensembler-test choderaj$ ensembler quickmodel --target_uniprot_entry_name ABL1_HUMAN --uniprot_domain_regex '^Protein kinase' --template_pdbids 2SRC
WARNING: /opt/anaconda1anaconda2anaconda3 not found.
Ignoring mpi4py.
Using mpi4py on OS X with Anaconda currently requires that /opt/anaconda1anaconda2anaconda3 points to your Anaconda installation.
As a workaround, you can create a symlink, e.g. "sudo ln -s ~/anaconda /opt/anaconda1anaconda2anaconda3
Done.
Querying UniProt web server...
Number of entries returned from initial UniProt search: 2
Set of unique domain names returned from the initial UniProt search using the query string 'mnemonic:ABL1_HUMAN':
set(['SH2', 'SH3', 'Protein kinase'])
Unique domain names selected after searching with the case-sensitive regex string '^Protein kinase':
set(['Protein kinase'])
Done.
Downloading PDB file for: 2SRC
Downloading sifts file for: 2SRC
1 PDB chains selected.
Extracting residues from PDB chains...
1 templates selected.
Writing template structures...
Done.
Traceback (most recent call last):
File "/Users/choderaj/miniconda/bin/ensembler", line 6, in <module>
sys.exit(main())
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler-1.0.2-py2.7.egg/ensembler/cli.py", line 40, in main
command.dispatch(args)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler-1.0.2-py2.7.egg/ensembler/cli_commands/quickmodel.py", line 106, in dispatch
QuickModel(targetid=args['--targetid'], templateids=templateids, target_uniprot_entry_name=args['--target_uniprot_entry_name'], uniprot_domain_regex=args['--uniprot_domain_regex'], pdbids=pdbids, chainids=chainids_dict, template_uniprot_query=args['--template_uniprot_query'], template_seqid_cutoff=template_seqid_cutoff, loopmodel=not args['--no-loopmodel'], package_for_fah=args['--package_for_fah'], nfahclones=nfahclones, structure_dirs=structure_paths)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler-1.0.2-py2.7.egg/ensembler/tools/quick_model.py", line 72, in __init__
templates_resolved_seq, templates_full_seq = ensembler.core.get_templates()
AttributeError: 'module' object has no attribute 'get_templates'
My bad; I had two conflicting versions of ensembler
installed.
Now I get something even more strange:
mski1776:ensembler-test choderaj$ ensembler quickmodel --target_uniprot_entry_name ABL1_HUMAN --uniprot_domain_regex '^Protein kinase' --template_pdbids 2SRC
WARNING: /opt/anaconda1anaconda2anaconda3 not found.
Ignoring mpi4py.
Using mpi4py on OS X with Anaconda currently requires that /opt/anaconda1anaconda2anaconda3 points to your Anaconda installation.
As a workaround, you can create a symlink, e.g. "sudo ln -s ~/anaconda /opt/anaconda1anaconda2anaconda3
Done.
Querying UniProt web server...
Number of entries returned from initial UniProt search: 2
Set of unique domain names returned from the initial UniProt search using the query string 'mnemonic:ABL1_HUMAN':
set(['SH2', 'SH3', 'Protein kinase'])
Unique domain names selected after searching with the case-sensitive regex string '^Protein kinase':
set(['Protein kinase'])
Done.
Downloading PDB file for: 2SRC
Downloading sifts file for: 2SRC
1 PDB chains selected.
Extracting residues from PDB chains...
1 templates selected.
Writing template structures...
Done.
MPI rank 0 pdbfixer error for template SRC_HUMAN_D0_2SRC_A - see logfile
Modeling missing loops for template SRC_HUMAN_D0_2SRC_A
Traceback (most recent call last):
File "/Users/choderaj/miniconda/bin/ensembler", line 6, in <module>
sys.exit(main())
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/cli.py", line 40, in main
command.dispatch(args)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/cli_commands/quickmodel.py", line 106, in dispatch
QuickModel(targetid=args['--targetid'], templateids=templateids, target_uniprot_entry_name=args['--target_uniprot_entry_name'], uniprot_domain_regex=args['--uniprot_domain_regex'], pdbids=pdbids, chainids=chainids_dict, template_uniprot_query=args['--template_uniprot_query'], template_seqid_cutoff=template_seqid_cutoff, loopmodel=not args['--no-loopmodel'], package_for_fah=args['--package_for_fah'], nfahclones=nfahclones, structure_dirs=structure_paths)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/tools/quick_model.py", line 95, in __init__
self._model(self.targetid, self.templateids, loopmodel=self.loopmodel, package_for_fah=self.package_for_fah, nfahclones=self.nfahclones)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/tools/quick_model.py", line 138, in _model
ensembler.modeling.model_template_loops(process_only_these_templates=templateids)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/utils.py", line 37, in print_done
fn(*args, **kwargs)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/modeling.py", line 74, in model_template_loops
loopmodel_templates(templates_resolved_seq, missing_residues_list, process_only_these_templates=process_only_these_templates, overwrite_structures=overwrite_structures)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/modeling.py", line 221, in loopmodel_templates
loopmodel_template(template, missing_residues[template_index], overwrite_structures=overwrite_structures)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/modeling.py", line 234, in loopmodel_template
write_loop_file(template, missing_residues)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/modeling.py", line 259, in write_loop_file
loop_residues_data = [(key[1], len(residues)) for key, residues in missing_residues.iteritems()]
AttributeError: 'NoneType' object has no attribute 'iteritems'
I think we need to fix an issue with loop modeling to build in missing template residues when there are no missing residues, but if we don't need to do that, I supposedly can just add the --no-loopmodel
flag:
ensembler quickmodel --target_uniprot_entry_name ABL1_HUMAN --uniprot_domain_regex '^Protein kinase' --template_pdbids 2SRC --no-loopmodel
This successfully gets through the MODELLER stage, but then dies at the explicit solvent stage:
mski1776:ensembler-test choderaj$ ensembler quickmodel --target_uniprot_entry_name ABL1_HUMAN --uniprot_domain_regex '^Protein kinase' --template_pdbids 2SRC --no-loopmodel
WARNING: /opt/anaconda1anaconda2anaconda3 not found.
Ignoring mpi4py.
Using mpi4py on OS X with Anaconda currently requires that /opt/anaconda1anaconda2anaconda3 points to your Anaconda installation.
As a workaround, you can create a symlink, e.g. "sudo ln -s ~/anaconda /opt/anaconda1anaconda2anaconda3
Done.
Querying UniProt web server...
Number of entries returned from initial UniProt search: 2
Set of unique domain names returned from the initial UniProt search using the query string 'mnemonic:ABL1_HUMAN':
set(['SH2', 'SH3', 'Protein kinase'])
Unique domain names selected after searching with the case-sensitive regex string '^Protein kinase':
set(['Protein kinase'])
Done.
Downloading PDB file for: 2SRC
Downloading sifts file for: 2SRC
1 PDB chains selected.
Extracting residues from PDB chains...
1 templates selected.
Writing template structures...
Done.
Working on target ABL1_HUMAN_D0...
Done.
=========================================================================
Working on target "ABL1_HUMAN_D0"
=========================================================================
-------------------------------------------------------------------------
Modelling "ABL1_HUMAN_D0" => "SRC_HUMAN_D0_2SRC_A"
-------------------------------------------------------------------------
The following 1 residues contain 6-membered rings with poor geometries
after transfer from templates. Rebuilding rings from internal coordinates:
<Residue 228 (type TYR)>
0 atoms in HETATM/BLK residues constrained
to protein atoms within 2.30 angstroms
and protein CA atoms within 10.00 angstroms
0 atoms in residues without defined topology
constrained to be rigid bodies
>> Summary of successfully produced models:
Filename molpdf
----------------------------------------
ABL1_HUMAN_D0.B99990001.pdb 50007.05859
Done.
Constructing a trajectory containing all valid models...
Conducting RMSD-based clustering...
1 unique models (from original set of 1) using cutoff of 0.060 nm
Done.
Auto-selected OpenMM platform: OpenCL
-------------------------------------------------------------------------
Simulating ABL1_HUMAN_D0 => SRC_HUMAN_D0_2SRC_A in implicit solvent for 100.0 ps (MPI rank: 0, GPU ID: 0)
-------------------------------------------------------------------------
/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/refinement.py:300: UserWarning: = ERROR start: MPI rank 0 hostname mski1776 gpuid 0 =
No compatible OpenCL platform is available
Traceback (most recent call last):
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 288, in refine_implicit_md
simulate_implicit_md()
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 122, in simulate_implicit_md
context = openmm.Context(system, integrator, platform, platform_properties)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/simtk/openmm/openmm.py", line 6469, in __init__
this = _openmm.new_Context(*args)
Exception: No compatible OpenCL platform is available
= ERROR end: MPI rank 0 hostname mski1776 gpuid 0
mpistate.rank, socket.gethostname(), gpuid, e, trbk
Done.
Done.
No nwaters information found.
Done.
Auto-selected OpenMM platform: OpenCL
Traceback (most recent call last):
File "/Users/choderaj/miniconda/bin/ensembler", line 6, in <module>
sys.exit(main())
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/cli.py", line 40, in main
command.dispatch(args)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/cli_commands/quickmodel.py", line 106, in dispatch
QuickModel(targetid=args['--targetid'], templateids=templateids, target_uniprot_entry_name=args['--target_uniprot_entry_name'], uniprot_domain_regex=args['--uniprot_domain_regex'], pdbids=pdbids, chainids=chainids_dict, template_uniprot_query=args['--template_uniprot_query'], template_seqid_cutoff=template_seqid_cutoff, loopmodel=not args['--no-loopmodel'], package_for_fah=args['--package_for_fah'], nfahclones=nfahclones, structure_dirs=structure_paths)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/tools/quick_model.py", line 95, in __init__
self._model(self.targetid, self.templateids, loopmodel=self.loopmodel, package_for_fah=self.package_for_fah, nfahclones=self.nfahclones)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/tools/quick_model.py", line 145, in _model
ensembler.refinement.refine_explicit_md(process_only_these_targets=targetid, process_only_these_templates=templateids, sim_length=self.sim_length)
File "/Users/choderaj/miniconda/lib/python2.7/site-packages/ensembler/refinement.py", line 1008, in refine_explicit_md
with open(nwaters_filename, 'r') as infile:
IOError: [Errno 2] No such file or directory: '/Users/choderaj/Desktop/ensembler-test/models/ABL1_HUMAN_D0/nwaters-use.txt'
@sonyahanson: Any ideas here? Is this a real bug?
Alright, so it has now started working again. I am guessing it was the ftp issue. I was playing around with the commands a bit and I am guessing the sifts db blocked both the clusters due to large number of requests. Though, I only had 150 odd templates so it shouldn't really have caused an issue.
ensembler quickmodel --target_uniprot_entry_name ABL1_HUMAN --uniprot_domain_regex '^Protein kinase' --template_pdbids 2SRC
Querying UniProt web server...
Number of entries returned from initial UniProt search: 2
Set of unique domain names returned from the initial UniProt search using the query string 'mnemonic:ABL1_HUMAN':
set(['SH2', 'SH3', 'Protein kinase'])
Unique domain names selected after searching with the case-sensitive regex string '^Protein kinase':
set(['Protein kinase'])
Done.
1 PDB chains selected.
Extracting residues from PDB chains...
1 templates selected.
Writing template structures...
Done.
Modeling missing loops for template SRC_HUMAN_D0_2SRC_A
Done.
Working on target ABL1_HUMAN_D0...
Done.
=========================================================================
Working on target "ABL1_HUMAN_D0"
=========================================================================
-------------------------------------------------------------------------
Modelling "ABL1_HUMAN_D0" => "SRC_HUMAN_D0_2SRC_A"
-------------------------------------------------------------------------
...
Done.
Constructing a trajectory containing all valid models...
Conducting RMSD-based clustering...
1 unique models (from original set of 1) using cutoff of 0.060 nm
Done.
Auto-selected OpenMM platform: OpenCL
btw is there any way to limit templates to those within a certain resolution cutoff?
By resolution, do you mean crystallographic resolution, allowing you to avoid low-resolution crystrallographic sturctures as templates?
I don't believe we have that capability yet, but it's a great idea that should be relatively easy to implement. Can you add a separate feature request, if this is what you intended?
Yea, thats what I meant. I will open the FR for it.
Out of curiosity, what information does the SIFTS file add that is not available in the RCSB databank? I have done a similar pipeline on a smaller scale with set of scripts and I thought the PDB+alignment+modeller was all that was needed
Good question. I think SIFTS provides a nice machine-readable annotation that contains pointers to numerous other databases. I dimly recall @danielparton noting that it has useful cross-references to canonical residue numbering schemes in UniProt, which we use for referencing which domains or sequence subsets we are modeling. This may not be useful for individual quick-model
one template-one target modeling, but is essential when working on the superfamily scale.
Go ahead and close this issue if your problems are solved?
Thanks!
Yep, SIFTS has residue-level mappings between PDB, UniProt and other databases.
The SIFTS server is often a bit flaky... I sometimes have to repeat a command to get a SIFTS file to download. On Dec 1, 2015 8:47 PM, "John Chodera" notifications@github.com wrote:
Good question. I think SIFTS provides a nice machine-readable annotation that contains pointers to numerous other databases. I dimly recall @danielparton https://github.com/danielparton noting that it has useful cross-references to canonical residue numbering schemes in UniProt, which we use for referencing which domains or sequence subsets we are modeling. This may not be useful for individual quick-model one template-one target modeling, but is essential when working on the superfamily scale.
— Reply to this email directly or view it on GitHub https://github.com/choderalab/ensembler/issues/62#issuecomment-161153251 .
I was trying to use the quickmodel command and it kept failing while trying to download the sifts files.
I am using the latest conda build and here is the command that I ran
Here is the error
I have reproduced the error on two different clusters and with a variety of template ids. As of this post, the sifts page for 2src seems to be up as well.