MOL2 writer fails if not loaded from MOL2

GoogleCodeExporter commented 9 years ago

Using MDAnalysis 0.9.0, I cannot write a MOL2 file if I loaded from anything that isn't a MOL2 file.

What steps will reproduce the problem?

 import MDAnalysis; from MDAnalysis.tests.datafiles import merge_ligand as LIGAND_PDB
 u = MDAnalysis.Universe(LIGAND_PDB)
 u.atoms.write("fromPDB.mol2")

What is the expected output? What do you see instead?

This should write a mol2 file of the ligand. Instead I get an AttributeError because the mol2 reader is trying to access "substructure", which is not available in all readers (in fact, it is only a part of the mol2 reader):

AttributeError: 'PrimitivePDBReader' object has no attribute 'substructure'

(see full trace below).

Thus, I consider the MOL2 writer broken because it can only be used in a limited set of circumstances at the moment.

It should be rewritten to cope with ALL input topologies. Given our improved topology analysis capabilities, this should be possible.
Appropriate test cases need to be added.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-1a7bca870f90> in <module>()
----> 1 u2.atoms.write("fromPDB.mol2")

/Volumes/Data/oliver/Biop/Library/python/mdanalysis/package/MDAnalysis/core/Atom
Group.py in write(self, filename, format, filenamefmt, **kwargs)
   2329             raise ValueError("No writer found for format: {}".format(filename))
   2330         else:
-> 2331             writer.write(self.atoms)
   2332             if coords:  # only these writers have a close method
   2333                 writer.close()

/Volumes/Data/oliver/Biop/Library/python/mdanalysis/package/MDAnalysis/coordinat
es/MOL2.py in write(self, obj)
    344         for framenumber in xrange(start, len(traj), step):
    345             traj[framenumber]
--> 346             self.write_next_timestep(obj)
    347 
    348         self.close()

/Volumes/Data/oliver/Biop/Library/python/mdanalysis/package/MDAnalysis/coordinat
es/MOL2.py in write_next_timestep(self, obj)
    357         or a :class:`~MDAnalysis.core.AtomGroup.Universe`.
    358         """
--> 359         block = self.encode_block(obj)
    360         self.file.writelines(block)

/Volumes/Data/oliver/Biop/Library/python/mdanalysis/package/MDAnalysis/coordinat
es/MOL2.py in encode_block(self, obj)
    305         bond_lines = "\n".join(bond_lines)
    306 
--> 307         substructure = traj.substructure[traj.frame]
    308         substructure = ["@<TRIPOS>SUBSTRUCTURE\n"] + substructure
    309 

AttributeError: 'PrimitivePDBReader' object has no attribute 'substructure'

Original issue reported on code.google.com by orbeckst on 19 Mar 2015 at 6:03

GoogleCodeExporter commented 9 years ago

This is a won't fix: in absence of mol2 atom types it is hard to infer those using just the PDB atom names/atom types.

It is possible to generate mol2 atom types, see rdkit (open source, MIT license) for converting between mol2 and pdb files, and specifically MolToPDBFile.

Original comment by jan...@gmail.com on 22 Mar 2015 at 2:11

GoogleCodeExporter commented 9 years ago

Should probably raise a NotImplementedError instead then?

try:
    substructure = traj.substructure[traj.frame]
except AttributeError:
    raise NotImplementedError("No MOL2 atom types found in traj")

And later a mol2 atom generator could be patched in the except branch.

Original comment by richardjgowers on 22 Mar 2015 at 7:30

GoogleCodeExporter commented 9 years ago

That seems a good idea. If we then properly document the behavior it should be fine.

@jandom: can you please do this?

If you have a simple example how to use other tools (as you said in the last comment) the it would be great to have this in the doc.

Original comment by orbeckst on 22 Mar 2015 at 8:36

jandom commented 9 years ago

Hi, yup - and thanks for the reminder ;) Jan

On Wednesday, April 15, 2015, Google Code Exporter notifications@github.com wrote:

That seems a good idea. If we then properly document the behavior it should be fine.

@jandom https://github.com/jandom: can you please do this?

If you have a simple example how to use other tools (as you said in the last comment) the it would be great to have this in the doc.

Am Mar 22, 2015 um 12:30 schrieb mdanalysis@googlecode.com javascript:_e(%7B%7D,'cvml','mdanalysis@googlecode.com');:

Original comment by orbeckst on 22 Mar 2015 at 8:36

Reply to this email directly or view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/227#issuecomment-89474549 .

orbeckst commented 7 years ago

https://www.cgl.ucsf.edu/chimera/docs/ProgrammersGuide/Examples/writeMol2.py contains a simple recipe for writing mol2 files. It does contain a dict to create the atom types but it seems that Chimera already has basic atom-typing built in.

# source: https://www.cgl.ucsf.edu/chimera/docs/ProgrammersGuide/Examples/writeMol2.py
# License: ?

#    Import Chimera modules used in this example.
import chimera

#    First, we'll open up a model to work with. This molecule (4fun) is very small,
#    comprised of just a couple residues, but it is perfect for illustrating
#    some important components of Chimera's object model.
#    For more information on how to open/close models in Chimera, see the
#    "Basic Model Manipulation" Example in the Programmer's Guide (coming soon). For now,
#    just understand that this code opens up any molecules stored in the file
#    "4fun.pdb" and returns a list of references to opened models.
#    (Put 4fun.pdb on your desktop or change the path in the command below.)
#
#    .. "4fun.pdb" 4fun.pdb
opened = chimera.openModels.open('~/Desktop/4fun.pdb')

#    Because only one molecule was opened, 'opened' is a list with just one element.
#    Get a reference to that element (which is a 'Molecule'
#    instance) and store it in 'mol'
mol = opened[0]

#    Now that we have a molecule to work with, an excellent way of examining its data structures is to flatten it out and write
#    it to a file. We'll write this file in the 'mol2' format, a free-format ascii file that describes molecular structure.
#    It is not necessary to have any prior knowledge of the 'mol2' format to understand this example, just a basic
#    comprehension of file formats that use coordinate data. Check out the "finished product".
#    It should serve as a good reference while you're going through the example.
#    Get a reference to a file to write to:
#
#    .. "finished product" 4fun.mol2
f = open("4fun.mol2", 'w')

#    mol2 uses a series of Record Type Indicators (RTI), that indicate the type of structure that will be described in
#    the following lines.
#    An RTI is simply an ASCII string which starts with an asterisk ('@'), followed by a string of characters,
#    and is terminated by a new line.
#    Here, we define some RTI's that we will use througout the file to describe the various parts of our model:

MOLECULE_HEADER = "@<TRIPOS>MOLECULE"
ATOM_HEADER     = "@<TRIPOS>ATOM"
BOND_HEADER     = "@<TRIPOS>BOND"
SUBSTR_HEADER   = "@<TRIPOS>SUBSTRUCTURE"

#    The 'chimera2sybyl' dictionary is used to map Chimera atom types
#    to Sybyl atom types.  See section below on writing out per-atom
#    information.
chimera2sybyl = {
        'C3'  : 'C.3',     'C2'  : 'C.2',     'Car' : 'C.ar',    'Cac' : 'C.2',
        'C1'  : 'C.1',     'N3+' : 'N.4',     'N3'  : 'N.3',     'N2'  : 'N.2',
        'Npl' : 'N.pl3',   'Ng+' : 'N.pl3',   'Ntr' : 'N.2',     'Nox' : 'N.4',
        'N1+' : 'N.1',     'N1'  : 'N.1',     'O3'  : 'O.3',     'O2'  : 'O.2',
        'Oar' : 'O.2',     'O3-' : 'O.co2',   'O2-' : 'O.co2',   'S3+' : 'S.3',
        'S3'  : 'S.3',     'S2'  : 'S.2',     'Sac' : 'S.O2',    'Son' : 'S.O2',
        'Sxd' : 'S.O',     'Pac' : 'P.3',     'Pox' : 'P.3',     'P3+' : 'P.3',
        'HC'  : 'H',       'H'   : 'H',       'DC'  : 'H',       'D'   : 'H',
        'P'   : 'P.3',     'S'   : 'S.3',     'Sar' : 'S.2',     'N2+' : 'N.2'
}

#    Writing Out per-Molecule Information
#    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
#    The "<TRIPOS>MOLECULE" RTI indicates that the next couple of lines will contain information relevant
#    to the molecule as a whole. First, write out the Record Type Indicator (RTI):
f.write("%s\n" % MOLECULE_HEADER)

#    The next line contains the name of the molecule. This can be accessed through the 'mol.name' attribute.
#    (Remember, 'mol' is a reference to the molecule we opened). If the model you open came from a pdb file, 'name' will most
#    often be the name of the file (without the '.pdb' extension).  For this example, 'mol.name' is "4fun".
f.write("%s\n" % mol.name)

#    Next, we need to write out the number of atoms, number of bonds, and number of substructures in the model (substructures
#    can be several different things; for the sake of simplicity, the only substructures we'll worry about here are residues).
#    This data is accessible through attributes of a molecule object: 'mol.atoms', 'mol.bonds', and 'mol.residues' all contain
#    lists of their respective components. We can determine how many atoms, bonds, or residues this
#    molecule has by taking the 'len' of the appropriate list.
#    save the list of references to all the atoms in 'mol':
ATOM_LIST = mol.atoms
#    save the list of references to all the bonds in 'mol':
BOND_LIST = mol.bonds
#    save the list of references to all the residues in 'mol':
RES_LIST  = mol.residues

f.write("%d %d %d\n" % ( len(ATOM_LIST), len(BOND_LIST), len(RES_LIST)) )

#    type of molecule
f.write("PROTEIN\n")

#    indicate that no charge-related information is available
f.write("NO_CHARGES\n")

f.write("\n\n")

#    Writing Out per-Atom Information
#    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
#    Next, write out atom-related information. In order to indicate this, we must first write out the
#    atom RTI:
f.write("%s\n" % ATOM_HEADER)

#   Each line under the 'ATOM' RTI consists of information pertaining to a single atom. The following information about each
#   atom is required: an arbitrary atom id number, atom name, x coordinate, y coordinate, z coordinate, atom type, id of the
#   substructure to which the atom belongs , name of the substructure to which the atom belongs.

#    You can look at each atom in the molecule by looping through its 'atoms' attribute.
#    Remember, 'ATOM_LIST' is the list of atoms stored in 'mol.atoms.' It's more efficient
#    to get the list once, and assign it to a variable, then to repeatedly ask for 'mol.atoms'.
for atom in ATOM_LIST:
    #    Now that we have a reference to an atom, we can write out all the necessary information to the file.
    #    The first field is an arbitrary id number. We'll just use that atom's index within the 'mol.atoms' list.
    f.write("%d " % ATOM_LIST.index(atom) )

    #    Next, we need the name of the atom, which is accessible via the 'name' attribute.
    f.write("%s " % atom.name)

    #    Now for the x, y, and z coordinate data.
    #    Get the atom's 'xformCoord' object. This is essentially a wrapper that holds information about the
    #    coordinate position (x,y,z) of that atom. 'xformCoord.x', 'xformCoord.y', and 'xformCoord.z' store the x, y,
    #    and z coordinates,
    #    respectively, as floating point integers. This information comes from the coordinates given for each atom
    #    specification in the input file
    coord = atom.xformCoord()
    f.write("%g %g %g " % (coord.x, coord.y, coord.z) )

    #    The next field in this atom entry is the atom type. This is a string which stores information about the
    #    chemical properties of the atom. It is accessible through the 'idatmType' attribute of an atom object.
    #    Because Chimera uses slightly different atom types than SYBYL (the modeling program for which .mol2 is the primary
    #    input format), use a dictionary called chimera2sybyl (defined above) that converts Chimera's atom types to
    #    the corresponding SYBYL version of the atom's type.
    f.write("%s " % chimera2sybyl[atom.idatmType])

    #    The last two fields in an atom entry pertain to any substructures to which the atom may belong.
    #    As previously noted, we are only interested in residues for this example.
    #    Every atom object has a 'residue' attribute, which is a reference to the residue to which that atom belongs.
    res   = atom.residue

    #    Here, we'll use 'res.id' for the substructure id field. 'res.id' is a string which represents a unique id
    #    for that residue (a string representation of a number, i.e. "1" , which are sequential, for all the
    #    residues in a molecule).
    f.write("%s " % res.id)

    #    The last field to write is substructure name. Here, we'll use the 'type' attribute of 'res'. the 'type' attribute contains
    #    a string representation of the residue type (e.g. "HIS", "PHE", "SER"...).  Concatenate onto this the residue's 'id'
    #    to make a unique name for this substructure (because it is possible, and probable, to have more than one
    #    "HIS" residue in a molecule. This way, the substructure name will be "HIS6" or "HIS28")
    f.write("%s%s\n" % (res.type, res.id) )

f.write("\n\n")

#    Writing Out per-Bond Information
#    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
#    Now for the bonds. The bond RTI says that the following lines will contain information about bonds.
f.write("%s\n" % BOND_HEADER)

#    Each line after the bond RTI contains information about one bond in the molecule.
#    As noted earlier, you can access all the bonds in a molecule through the 'bonds' attribute,
#    which contains a list of bonds.
for bond in BOND_LIST:

    #    each bond object has an 'atoms' attribute, which is list of length 2, where each item in the list is
    #    a reference to one of the atoms to which the bond connects.
    a1, a2 = bond.atoms

    #    The first field in a mol2 bond entry is an arbitrary bond id. Once again, we'll just use that
    #    bond's  index in the 'mol.bonds' list
    f.write("%d " % BOND_LIST.index(bond) )

    #    The next two fields are the ids of the atoms which the bond connects. Since we have a reference to both these
    #    atoms (stored in 'a1' and 'a2'), we can just get the index of those objects in the 'mol.atoms' list:
    f.write("%s %s " % (ATOM_LIST.index(a1), ATOM_LIST.index(a2)) )

    #    The last field in this bond entry is the bond order. Chimera doesn't currently calcuate bond orders,
    #    but for our educational purposes here, this won't be a problem.
    #    The mol2 format expects bond order as a string: "1" (first-order), "2" (second-order), etc.,  so
    #    just write out "1" here (even though this may not be correct).
    f.write("1\n")

f.write("\n\n")

#    Writing Out per-Residue Information
#    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
#    Almost done!!! The last section contains information about the substructures (i.e. residues for this example)
#    You know the drill:
f.write("%s\n" % SUBSTR_HEADER)

#    We've already covered some of these items (see above):
for res in RES_LIST:
    #    residue id field
    f.write("%s " % res.id )

    #    residue name field
    f.write("%s%s " % (res.type, res.id) )

    #    the next field specifies the id of the root atom of the substructure. For the case of residues,
    #    we'll use the alpha-carbon as the root.
    #    Each residue has an 'atomsMap' attribute which is a dictionary. The keys in this dictionary are
    #    atom names (e.g. 'C', 'N', 'CA'), and the values are lists of references to atoms in the residue that have that
    #    name. So, to get the alpha-carbon of this residue:
    alpha_carbon = res.atomsMap['CA'][0]

    #    and get the id of 'alpha_carbon' from the 'mol.atoms' list
    f.write("%d " % ATOM_LIST.index(alpha_carbon) )

    #    The final field of this substructure entry is a string which specifies what type of substructure it is:
    f.write("RESIDUE\n")

f.write("\n\n")
f.close()

#    And that's it! Don't worry if you didn't quite understand all the ins and outs of the mol2 file format.
#    The purpose of this exercise was to familiarize yourself with Chimera's object model; writing out a mol2 file
#    was just a convenient way to do that. The important thing was to gain an understanding of how Chimera's atoms,
#    bonds, residues, and molecules all fit together.

orbeckst commented 7 years ago

The original description for the mol2 format is not available anymore (see eg http://forums.openbabel.org/broken-link-td4659911.html).

I downloaded http://chemyang.ccnu.edu.cn/ccb/server/AIMMS/mol2.pdf (Sybyl 7.1 Mid 2005) and attach it here for safe keeping...

mol2.pdf

MDAnalysis / mdanalysis

MOL2 writer fails if not loaded from MOL2 #227