Closed rafwiewiora closed 8 years ago
Wow. AMBER and prmtop-ffxml certainly shouldn't be giving such large discrepancies.
Is this sufficiently automated such that you could go through and zero out one improper) or all impropers but one) at a time to figure out if there is a single term responsible for much of this discrepancy?
Is this sufficiently automated such that you could go through and zero out one improper) or all impropers but one) at a time to figure out if there is a single term responsible for much of this discrepancy?
Now it finally can! I'm turning off all propers and only turning on one improper at a time - in AMBER and in OpenMM-conversion. All pass the tests...
So it's either some mix up (maybe some overriding conflict) in the ffxml or there's no problems one-by-one in impropers, but they depend on each other and problem shows up in the whole system only? Impropers are simply additive, right?
The impropers should be additive.
I wonder if this is a combination of (1) tleap making a different choice from OpenMM for most of these impropers and (2) slight deviations from planarity causing the energies to add up to a larger error than usual.
It was actually a bug in the debugging script... Got it now!
So what I'm doing is:
Total improper energies are (kJ/mol): Amber - 75.0081460522 ffxml- 89.2323361429
Now we have two culprits here:
A
AMBER dihedral: ('H4', 'X', 'CS', 'X')
ffxml translation:
<Improper type1="CS" type2="" type3="" type4="H4" periodicity1="2" phase1="3.14159265359" k1="4.6024"/>
Energies (AMBER, ffxml):
('PeriodicTorsionForce', 15.537213444709778)
('PeriodicTorsionForce', 23.092045068740845)
That's pretty bad energy difference - but this is only in 6 impropers in the whole system that fall under these params, all in U and here comes the problem - the order of atoms in the OpenMM system:
<Dihedral [imp]; <Atom C5 [75]; In U 2>--<Atom N1 [72]; In U 2>--<Atom C6 [73]; In U 2>--<Atom H6 [74]; In U 2>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000, scee=1.000, scnb=1.000>>
Types: CS - N* - CS - H4
AMBER:
<Dihedral [imp]; <Atom N1 [72]; In U 2>--<Atom C5 [75]; In U 2>--<Atom C6 [73]; In U 2>--<Atom H6 [74]; In U 2>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000, scee=0.000, scnb=0.000>>
Types: N* - CS - CS - H4
In this case, this carbon rule is giving us a result opposite to AMBER.
B
Same story, just different carbon atom type. 6 of these in C's.
AMBER dihedral: ('H4', 'X', 'C4', 'X')
ffxml translation:
<Improper type1="C4" type2="" type3="" type4="H4" periodicity1="2" phase1="3.14159265359" k1="4.6024"/>
Energies (AMBER, ffxml): (kJ/mol)
('PeriodicTorsionForce', 19.267142295837402)
('PeriodicTorsionForce', 25.95484161376953)
OpenMM order:
<Dihedral [imp]; <Atom C5 [13]; In C5 0>--<Atom N1 [10]; In C5 0>--<Atom C6 [11]; In C5 0>--<Atom H6 [12]; In C5 0>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000, scee=1.000, scnb=1.000>>
AMBER:
<Dihedral [imp]; <Atom N1 [10]; In C5 0>--<Atom C5 [13]; In C5 0>--<Atom C6 [11]; In C5 0>--<Atom H6 [12]; In C5 0>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000, scee=0.000, scnb=0.000>>
What can we do about this? The energy differences are big, but do they matter as long as the improper is doing it's job of keeping things in the right planes? How can I look into the rules AMBER uses here and learn why it puts those nitrogens, not carbons, first in these cases?
Pinging @swails and @peastman too.
These are N - C(sp2) - C(sp2) - H in the C/U rings, so I guess the energy changes on making a mistake here do not surprise.
Interestingly, I'm seeing two >1 kJ/mol misfits for the DNA system too, but they're in both directions and cancel themselves out just enough to make the improper energies fall into 1% tolerance!
We have: A
('H', 'X', 'N2', 'X')
Energies (AMBER / OpenMM):
('PeriodicTorsionForce', 155.19318262359593)
('PeriodicTorsionForce', 153.32055167027283)
Is this due to a (partial) swap in two hydrogens in the order: AMBER (these are atom names not types):
{('C2', 'H21', 'N2', 'H22'),
('C4', 'H41', 'N4', 'H42'),
('C6', 'H61', 'N6', 'H62')}
OpenMM (second one is extra)
{('C2', 'H21', 'N2', 'H22'),
('C2', 'H22', 'N2', 'H21'),
('C4', 'H41', 'N4', 'H42'),
('C6', 'H61', 'N6', 'H62')}
B Same story as with the RNA 2 comments above - C first in OpenMM / N first in AMBER, in DC / DT base rings.
('H4', 'X', 'CM', 'X')
Energies (AMBER - OpenMM):
('PeriodicTorsionForce', 10.416644178898423)
('PeriodicTorsionForce', 13.4050743040425)
Here's what @swails said about it at https://github.com/pandegroup/openmm/issues/220#issuecomment-115458262:
Amber improper torsions are sadly done in a way that is not well-defined. The central atom comes third, and the other three are in alphabetical order by atom type. When two atoms involved in the improper torsion have the same type, the ordering becomes undefined, and indeed there are instances where leap will assign two different copies of the same ligand with different orders of impropers that are still consistent according to their sorting rules.
I have plans to go into tleap and order by atom type, then order degenerate atoms by atom name, but I haven't done this yet. atom name should have been done from the beginning, since that choice does not suffer from undefined behavior (all atom names in a residue must be unique), and if you change the atom types as part of refactoring a force field, you might unknowingly change the ordering of the impropers.
At the end of the day, these torsions are not rigorously derived, anyway -- they just exist to keep things planar. And they'll do that even if the order isn't exactly the same between programs. So I don't think it's a big deal.
And they'll do that even if the order isn't exactly the same between programs. So I don't think it's a big deal.
Well, 14 kJ/mol is almost 6 kT, which is a factor of a 400 in population of those structures.
I think we should probably put a big warning in the manual at this point about the fact that LEaP does not unambiguously define an improper ordering, requiring users who want exact agreement with LEaP to go through the LEaP route and load in via AmberPrmtopFile/AmberInpcrdFile
.
Thanks for tracking this down, @rafwiewiora!
Notice that the behavior I described isn't really the same as what @rafwiewiora reported here. The ordering of CS, N*, and H4 should not be ambiguous. There's something else happening here, but I don't know what it is. I also admit that 14 kJ/mol is a lot more than I would expect. Of course this difference can be very misleading, as it could be (and likely is, to a large extent) basically a shift in the potential energy surface. But still worth further investigation.
Can we track this down further into the tleap code?
I see the sorting here, but this is only in the parameter set, before assignment for the system, right? (not sure if I'm using the right nomenclature for LeAP here)
So what decides on the order the two wildcards get assigned to actual atoms? I've been looking through the code, but I don't know C well enough to work it out. Can someone help out?
I suppose that will be the place where these rules: https://github.com/pandegroup/openmm/blob/master/wrappers/python/simtk/openmm/app/forcefield.py#L1378-L1388 came from.
Also, I want to clarify what 'central atom' means. That makes sense for an atom bound to 3 other atoms, but e.g. the problematic impropers in RNA are a linear chain of N-C-C-H - what defines the central atom there? And why is that not just defined as a proper?
Ack, I don't know anymore :(
the problematic impropers in RNA are a linear chain of N-C-C-H - what defines the central atom there?
That's exactly the definition of a proper torsion! So if it's being called an improper somewhere, that seems like a bug.
I think that impropers can actually be assigned to propers in some places... :( I haven't looked at the logic for how LEaP assigns impropers in a ton of detail, but from what I've heard, it's weird.
That's exactly the definition of a proper torsion! So if it's being called an improper somewhere, that seems like a bug.
Sorry, my bad! I've looked in Pymol wrongly - it's fine, it's a central C with the three atoms bound to it. I realized there was no way OpenMM would assign this as an improper if it wasn't an improper, as it looks at bonding when constructing system data.
Looking for solutions to the impropers issue here now.
I confused myself a bit before with this one improper in DNA & RNA, so to re-summarize:
('H4', 'X', 'CM', 'X')
(3rd atom in AMBER is the central atom, there's a comment with half a line of exclamation marks in the leap code about the 3rd atom having to be the central one) YET
('N*', 'CM', 'CM', 'H4')
for that improper - this is now in ParmEd / OpenMM notation with 1st atom == central. This breaks the 3rd atom in AMBER == central rule, the angle as listed can neither be defined as a proper or as an improper (the true central atom is a CM type). I assume here that I am not getting tricked somewhere by a weird bug in ParmEd, which mixes up this 3rd-atom-central --> 1st-atom-central translation. @swails is this a fair assumption? I can't see any bugs in ParmEd as far as I've looked.So this is looking like a bug in LeAP, I can't work out where it's coming from just yet - but this would potentially mean the converted forcefields might be unusable for nucleic acids until this is resolved. @swails could you point me to someone in the AMBER team I could speak to about this?
pinging @swails and @jchodera with a reminder here, see above
I'm confused... It looks like there's a CM
in the third position of both the improper with wild-cards and in the one that LEaP assigns. What am I missing? That looks consistent with what LEaP should be doing, assuming that N*
is in the residue before CM
, CM
, and H4
.
As for whom to contact, the Amber list, or perhaps the Amber developer list, is the best bet.
OK, I think I've confused myself really badly and you only helped me rethink that with your comment. I think I thought too much about how the 3rd atom == central becomes 1st atom == central in ffxml, but then again 3rd atom == central on assigning the torsions on createSystem
. And because OpenMMParameterSet has central atom == 1st atom I assumed somewhere along the way that every improper I get out of ParmEd would use that convention. But now I finally understand that if I make a Structure
the order is as in the original input, only if I make a ParameterSet
- this https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L75 filtering applies.
Ok, so no bug in LeAP, this is just the order of assignment of the other 3 atoms issue. I have an idea to address this.
Btw, this function https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L75 has misleading docs. You say in the docstring:
The default is to *always* put the central atom first, and assume it
comes first already if no central atom is detected.
but the default is actually to assume the third atom is central https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L119
Btw, this function https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L75 has misleading docs. You say in the docstring:
The default is to *always* put the central atom first, and assume it comes first already if no central atom is detected.
but the default is actually to assume the third atom is central https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L119
I clarified the docs here a little bit. (See ParmEd/ParmEd#650). Note that that assumption (of third atom being central) is only hit if the structure has no bonds defined (which is only ever the case when starting from a PDB or mmCIF file). In every other case, the central atom is identified as the one forming bonds with the other 3. I added a note about having that section of the code use the center
parameter to assign the central atom in the case where it's ambiguous.
But to make things even more confusing, CHARMM always puts the central atom first, AMBER always puts it third (except in the rare case when it needs to be "flipped" because the first atom in the system is a central atom, then the central atom becomes second). But ParmEd, like OpenMM, needs to handle both cases...
EDIT: fixed PR reference to ParmEd repository.
Looks good! (your links link to this repo instead of ParmEd though :) ). It looks like the central atom in there will not be defined in any case where you don't have one atom forming bonds with the other 3 right? So because I was suspecting a bug out of LeAP I made sure I knew that function in case the LeAP bug was giving some impropers with not the right bonds.
I think what would've been helpful to avoid my confusion with this would be an attribute on the Dihedral
that gives the index of the central atom if improper = True.
@jchodera @swails
I swear nightmares about dihedrals will keep me awake at night forever after this...
RNA test system - cleaned up 5C5W:https://gist.github.com/rafwiewiora/686a17b8004414fe5562
Fails by a lot on the impropers. (I'm considering ff14SB specifically, but fail is for every single FF - didn't check by how much for others) I always bear in mind the problems with the order of atoms not being well-defined in AMBER, so we already allow a higher tolerance of 1% for impropers. Perhaps this would need to be higher for RNA due to the particular atom types being used.
But it's also failing another 'test' that never failed me before and I wonder what this means. The play is to create an ffxml from the prmtop, paste in all atom types and residue templates from the full converted ffxml and compare energies of systems built on the two ffxmls. The 'prmtop-ffxml' has always before reproduced the AMBER energies and hence pointed to problems with the
ForceField
. The prmtop-ffxml is generated this way:This time I have 3 different energies - AMBER, prmtop-ffxml and full converted ffxml. You're looking at the second PeriodicTorsionForce in each. AMBER:
prmtop-ffxml:
full ffxml:
Nothing that immediately pops out to be a problem just playing around with these systems and attributes. I was hoping you might have some suggestions / ideas? The fact that the prmtop-ffxml is neither with aggrement with AMBER or full ffxml tells me there's something more at play than just an AMBER / OpenMM atom order mismatch.