Open austinjpaul opened 5 years ago
Woah, this mol2 file is really messed up. The atoms from residue 1 (CYS) and residue 2 (GLY) are interleaved. ParmEd doesn't support this (and I'm almost certain the mol2 file doesn't permit this behavior in its specification).
Where did you find this file??
If you want a workaround, my suggestion is to change all of the residue names to XXXX or something. Otherwise your structure will have 11 residues regardless of what program reads it. That should also fix the error you're seeing.
That method could certainly use refactoring, but supporting parsing mol2 files as a Structure
instead of residue templates was almost an afterthought -- they're mainly used to store topology and atom charge information for building residue template libraries.
Hmmm... thanks for the help.
This mol2 file was generated from openbabel from an InChI string (InChI=1S/C10H17N3O6S/c11-5(10(18)19)1-2-7(14)13-6(4-20)9(17)12-3-8(15)16/h5-6,20H,1-4,11H2,(H,12,17)(H,13,14)(H,15,16)(H,18,19)/t5-,6-/m0/s1
)
I was stuck using structure=True
because of the repeated atom names. Telling openbabel not to output residue information not only changes all residue names to <1>
, but gives the atoms distinct names.
Do you know where I might find the mol2 specification to see if this is indeed disallowed? I should perhaps go file an issue or two against obabel.
You can find the mol2 file format specification from SYBYL here. It seems that this is in fact a legal mol2 file according to that spec (which doesn't really impose many constraints at all), but it imposes some pretty significant limitations on the format. For instance, every residue must have a different identifier (name), so you can't have two amino acids of the same type have the same name (e.g., 2 alanines) unless you're willing to have compliant parsers group them together into a single super-residue.
They really intended the mol2 file to be a small molecule format.
Can you convert that InChl string to another format ParmEd understands, like a PDB file, maybe?
The mol2 format is worse than the PDB format as far as compliance of the variations out there, and I place priority on supporting the files used by the MD engines supported by ParmEd.
I noted that the file extension is .sdf
but the file content is of .mol2
format.
We can parse the file into a DataFrame, group the residues, and then write back to a mol2 file. For example, my python package pdbx2df can do that with the read_mol2
and write_mol2
functions.
When trying to load a mol2 file as a structure, I'm getting an error from the head/tail residue bond logic:
Admittedly, there could be something funky with my mol2 file (pasted below), but would it be possible to bypass the ResidueTemplate building logic when using Structure?