Open spadavec opened 7 years ago
Thanks! mol2
parsing is much more fragile that I'd like, especially when it comes to weird filename issues.
@andrrizzi : Can you take a look at this at some point?
Sure! I'll punt it for post-1.0 release if this is not urgent (feel free to change the issue tag).
Okay, to shine some more light on this, it turns out that if the molecule name in the .mol2
file starts with a digit, it throws this error. So long as the first character is a letter, it will pass.
e.g.
Fail: 20G
, 2G0
, 02G
, 0G2
Works: G20
, G02
.
Considering we dont really use the molecule name beyond just ID'ing it, its a pretty easy work around.
Why its doing this, I don't know.
I actually fixed this case in #465 with the addition of the utils.Leap._sanitize_unit_name
function, but it looks like it's not only the first character that should be different than a digit.
Leap may also have problems with residue names that start/contain a digit.
may also have problems with residue names that start/contain a digit
DOES have such problems.
Can't we just temporarily rename the ligand residue when feeding it into Antechamber/LEAP?
We can definitely change it, but I'm not sure we can do it temporarily unless we modify the prmtop
file after it is created by tleap.
Where precisely are things going wrong? Are we somehow generating an MDTraj or OpenMM Topology
from the prmtop
and running into MDTraj issues during a DSL query when residues have numbers for names? Or, as the initial part of the issue suggests, are we running into trouble when using as input mol2
files that have numbers for residue names?
we running into trouble when using as input
mol2
files that have numbers for residue names?
I believe it's this.
What if we
resname
(e.g. MOL
) that doesn't appear in the receptor topologyresname
when generating the mol2 filecomplex.pdb
and ligand.pdb
to the original name for analysis / MDTraj DSL processingThe residue name in the prmtop file doesn't matter, since we don't use that for anything---it's just the PDB files generated from that we end up using for DSL searches, analysis, etc.
The residue name in the
prmtop
file doesn't matter, since we don't use that for anything
We do actually because at the moment we use that topology for all the DSL expressions we have in the YAML file (e.g., ligand_DSL, receptor/ligand atoms for restraints). We would have to switch to a system where we read the system from the prmtop
, the initial positions from the inpcrd
file, and the topology from the pdb
. This only if the system has been set up through the automatic pipeline though, otherwise we'll have to use only the two files that the users provide.
I see---the problem is that we create the Topography
from the Topology
read in from the AMBER prmtop
file in these lines, so we would have to
This is very likely a niche issue (my very sloppy work found this error), but it may be worth sanitizing .mol2 files during setup--
If the substructure id for a atom in a mol2 file matches the filename, e.g.
Then setup will fail with the following:
This occurs with the following: