Setup will fail if .mol2 atom substructure ID matches filename

spadavec commented 7 years ago

This is very likely a niche issue (my very sloppy work found this error), but it may be worth sanitizing .mol2 files during setup--

If the substructure id for a atom in a mol2 file matches the filename, e.g.

20G.mol2
 47 48 0 0 0
SMALL
GASTEIGER

@<TRIPOS>ATOM
      1  C09       -2.7090   -4.5890    6.9990 C.2   203  20G      0.2532
      2  N10       -3.6170   -5.6600    6.6290 N.am  203  20G     -0.2986
      3  C11       -4.8540   -5.3500    5.9460 C.3   203  20G      0.0164
....

Then setup will fail with the following:

2017-06-27 11:00:19,824: Setting CUDA platform to use precision model 'mixed'.
2017-06-27 11:00:19,828: Single node: executing <function _check_resume at 0x7f745d8cbe60>
2017-06-27 11:00:19,828: Single node: executing <function _setup_experiments at 0x7f745d8d1140>
2017-06-27 11:00:19,828: Setting up the systems for 4LUC, 20G and pme
2017-06-27 11:02:09,738: Single node: executing <function _safe_makedirs at 0x7f745d8d1320>
2017-06-27 11:02:09,738: Single node: executing <bound method YamlBuilder._generate_yaml of <yank.yamlbuild.YamlBuilder object at 0x7f745ccaaf50>>
2017-06-27 11:02:09,748: DSL string for the ligand: "resname 20G"
2017-06-27 11:02:09,748: DSL string for the solvent: "auto"
2017-06-27 11:02:09,749: Reading phase complex
2017-06-27 11:02:09,749: prmtop: ./setup/systems/4LUC/complex.prmtop
2017-06-27 11:02:09,749: inpcrd: ./setup/systems/4LUC/complex.inpcrd
Traceback (most recent call last):
  File "/home/yanker/miniconda2/bin/yank", line 11, in <module>
    load_entry_point('yank==0.16.1', 'console_scripts', 'yank')()
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/cli.py", line 71, in main
    dispatched = getattr(commands, command).dispatch(command_args)
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/commands/script.py", line 100, in dispatch
    yaml_builder.run_experiments()
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/yamlbuild.py", line 1462, in run_experiments
    for experiment_index, experiment in enumerate(self._build_experiments()):
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/yamlbuild.py", line 2392, in _build_experiments
    yield self._build_experiment(combination, output_dir)
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/yamlbuild.py", line 2654, in _build_experiment
    solvent_atoms=solvent_dsl)
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/yank.py", line 90, in __init__
    self.ligand_atoms = ligand_atoms
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/yank.py", line 113, in ligand_atoms
    self._ligand_atoms = self._resolve_atom_indices(value)
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/yank/yank.py", line 215, in _resolve_atom_indices
    atoms_description = self._topology.select(atoms_description).tolist()
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/mdtraj/core/topology.py", line 933, in select
    filter_func = parse_selection(selection_string).expr
  File "/home/yanker/miniconda2/lib/python2.7/site-packages/mdtraj/core/selection.py", line 354, in __call__
    raise ValueError('\n'.join(lines))
ValueError: Expected end of text (at char 10), (line:1, col:11): resname 20G
                                                                           ^^^

This occurs with the following:

yanker@hopper:~/yank_files/kras$ conda list
# packages in environment at /home/yanker/miniconda2:
#
alabaster                 0.7.9                    py27_0  
alchemy                   1.2.3                    py27_0    omnia
ambermini                 16.16.0                       6    omnia
babel                     2.3.4                    py27_0  
cairo                     1.14.8                        0  
cffi                      1.9.1                    py27_0  
click                     6.7                       <pip>
clusterutils              0.1.2                    py27_0    omnia
conda                     4.3.22                   py27_0  
conda-env                 2.6.0                         0  
cryptography              1.7.1                    py27_0  
curl                      7.49.0                        1  
cycler                    0.10.0                   py27_0  
cython                    0.25.2                   py27_0  
dbus                      1.10.10                       0  
docopt                    0.6.2                    py27_1    omnia
docutils                  0.12                     py27_2  
enum34                    1.1.6                    py27_0  
expat                     2.1.0                         0  
fftw3f                    3.3.4                         2    omnia
Flask                     0.12                      <pip>
fontconfig                2.12.1                        3  
freetype                  2.5.5                         2  
functools32               3.2.3.2                  py27_0  
glib                      2.50.2                        1  
gst-plugins-base          1.8.0                         0  
gstreamer                 1.8.0                         0  
hdf4                      4.2.12                        0  
hdf5                      1.8.17                        1  
icu                       54.1                          0  
idna                      2.1                      py27_0  
imagesize                 0.7.1                    py27_0  
ipaddress                 1.0.17                   py27_0  
itsdangerous              0.24                      <pip>
jinja2                    2.8.1                    py27_0  
jpeg                      8d                            2  
latexcodec                1.0.1                    py27_0    omnia
libffi                    3.2.1                         1  
libgcc                    5.2.0                         0  
libgfortran               3.0.0                         1  
libiconv                  1.14                          0  
libnetcdf                 4.4.1                         0  
libpng                    1.6.27                        0  
libxcb                    1.12                          1  
libxml2                   2.9.4                         0  
markupsafe                0.23                     py27_2  
matplotlib                2.0.0               np111py27_0  
mdtraj                    1.8.0               np111py27_1    omnia
mkl                       2017.0.1                      0  
mpi4py                    2.0.0                    py27_2  
mpich2                    1.4.1p1                       0  
netcdf4                   1.2.4               np111py27_0  
nose                      1.3.7                    py27_1  
numexpr                   2.6.1               np111py27_2  
numpy                     1.11.3                   py27_0  
numpydoc                  0.6.0                    py27_0  
openmm                    7.2.0                    py27_0    omnia/label/dev
openmmtools               0.11.0                   py27_0    omnia
openmoltools              0.7.4                    py27_0    omnia
openssl                   1.0.2j                        0  
oset                      0.1.3                    py27_1    omnia
pandas                    0.19.2              np111py27_1  
parmed                    2.7.1                    py27_0    omnia
pcre                      8.39                          1  
pip                       8.1.2                    py27_0  
pixman                    0.34.0                        0  
pyasn1                    0.1.9                    py27_0  
pybtex                    0.18                     py27_0    omnia
pybtex-docutils           0.2.1                    py27_1    omnia
pycairo                   1.10.0                   py27_0  
pycosat                   0.6.1                    py27_1  
pycparser                 2.17                     py27_0  
pycrypto                  2.6.1                    py27_4  
pygments                  2.1.3                    py27_0  
pymbar                    3.0.0.beta2         np111py27_0    omnia
pyopenssl                 16.2.0                   py27_0  
pyparsing                 2.1.4                    py27_0  
pyqt                      5.6.0                    py27_2  
pytables                  3.3.0               np111py27_0  
python                    2.7.12                        1  
python-dateutil           2.6.0                    py27_0  
pytz                      2016.10                  py27_0  
pyyaml                    3.12                     py27_0  
qt                        5.6.2                         2  
readline                  6.2                           2  
redis                     2.10.5                    <pip>
requests                  2.12.4                   py27_0  
rq                        0.7.1                     <pip>
ruamel_yaml               0.11.14                  py27_0  
schema                    0.6.2                    py27_0    omnia
scipy                     0.18.1              np111py27_1  
setuptools                27.2.0                   py27_0  
sip                       4.18                     py27_0  
six                       1.10.0                   py27_0  
snowballstemmer           1.2.1                    py27_0  
sphinx                    1.5.1                    py27_0  
sphinxcontrib-bibtex      0.3.2                    py27_0    omnia
sqlite                    3.13.0                        0  
subprocess32              3.2.6                    py27_0    omnia
tk                        8.5.18                        0  
Werkzeug                  0.11.15                   <pip>
wheel                     0.29.0                   py27_0  
yaml                      0.1.6                         0  
yank                      0.16.1                   py27_0    omnia
zlib                      1.2.8                         3

jchodera commented 7 years ago

Thanks! mol2 parsing is much more fragile that I'd like, especially when it comes to weird filename issues.

@andrrizzi : Can you take a look at this at some point?

andrrizzi commented 7 years ago

Sure! I'll punt it for post-1.0 release if this is not urgent (feel free to change the issue tag).

Lnaden commented 7 years ago

Okay, to shine some more light on this, it turns out that if the molecule name in the .mol2 file starts with a digit, it throws this error. So long as the first character is a letter, it will pass.

e.g. Fail: 20G, 2G0, 02G, 0G2 Works: G20, G02.

Considering we dont really use the molecule name beyond just ID'ing it, its a pretty easy work around.

Why its doing this, I don't know.

andrrizzi commented 6 years ago

I actually fixed this case in #465 with the addition of the utils.Leap._sanitize_unit_name function, but it looks like it's not only the first character that should be different than a digit.

andrrizzi commented 6 years ago

Leap may also have problems with residue names that start/contain a digit.

davidlmobley commented 6 years ago

may also have problems with residue names that start/contain a digit

DOES have such problems.

jchodera commented 6 years ago

Can't we just temporarily rename the ligand residue when feeding it into Antechamber/LEAP?

andrrizzi commented 6 years ago

We can definitely change it, but I'm not sure we can do it temporarily unless we modify the prmtop file after it is created by tleap.

jchodera commented 6 years ago

Where precisely are things going wrong? Are we somehow generating an MDTraj or OpenMM Topology from the prmtop and running into MDTraj issues during a DSL query when residues have numbers for names? Or, as the initial part of the issue suggests, are we running into trouble when using as input mol2 files that have numbers for residue names?

andrrizzi commented 6 years ago

we running into trouble when using as input mol2 files that have numbers for residue names?

I believe it's this.

jchodera commented 6 years ago

What if we

Generate a resname (e.g. MOL) that doesn't appear in the receptor topology
Rename the ligand to resname when generating the mol2 file
Rename the residue in LEaP-generated complex.pdb and ligand.pdb to the original name for analysis / MDTraj DSL processing

The residue name in the prmtop file doesn't matter, since we don't use that for anything---it's just the PDB files generated from that we end up using for DSL searches, analysis, etc.

andrrizzi commented 6 years ago

The residue name in the prmtop file doesn't matter, since we don't use that for anything

We do actually because at the moment we use that topology for all the DSL expressions we have in the YAML file (e.g., ligand_DSL, receptor/ligand atoms for restraints). We would have to switch to a system where we read the system from the prmtop, the initial positions from the inpcrd file, and the topology from the pdb. This only if the system has been set up through the automatic pipeline though, otherwise we'll have to use only the two files that the users provide.

jchodera commented 6 years ago

I see---the problem is that we create the Topography from the Topology read in from the AMBER prmtop file in these lines, so we would have to

change the ligand residue name in the mol2 file to process it through LEaP
restore the ligand residue name in the prmtop file with ParmEd or search-and-replace

choderalab / yank

Setup will fail if .mol2 atom substructure ID matches filename #703