choderalab / openmm

OpenMM is a toolkit for molecular simulation using high performance GPU code.
1 stars 0 forks source link

Impropers fail for RNA test #9

Closed rafwiewiora closed 8 years ago

rafwiewiora commented 8 years ago

@jchodera @swails

I swear nightmares about dihedrals will keep me awake at night forever after this...

RNA test system - cleaned up 5C5W:https://gist.github.com/rafwiewiora/686a17b8004414fe5562

Fails by a lot on the impropers. (I'm considering ff14SB specifically, but fail is for every single FF - didn't check by how much for others) I always bear in mind the problems with the order of atoms not being well-defined in AMBER, so we already allow a higher tolerance of 1% for impropers. Perhaps this would need to be higher for RNA due to the particular atom types being used.

But it's also failing another 'test' that never failed me before and I wonder what this means. The play is to create an ffxml from the prmtop, paste in all atom types and residue templates from the full converted ffxml and compare energies of systems built on the two ffxmls. The 'prmtop-ffxml' has always before reproduced the AMBER energies and hence pointed to problems with the ForceField. The prmtop-ffxml is generated this way:

parm = parmed.load_file('rna.top', 'rna.crd')
parm = parmed.amber.AmberParameterSet.from_structure(parm)
parm = parmed.openmm.OpenMMParameterSet.from_parameterset(parm)
parm.write('rna.xml')

This time I have 3 different energies - AMBER, prmtop-ffxml and full converted ffxml. You're looking at the second PeriodicTorsionForce in each. AMBER:

('HarmonicBondForce', 1922.0866981837933),
 ('HarmonicAngleForce', 7738.2501663624735),
 ('PeriodicTorsionForce', 2753.918177413638),
 ('PeriodicTorsionForce', 75.00816251174547),
 ('NonbondedForce', 890.8151448667049),
 ('CMMotionRemover', 0.0)]

prmtop-ffxml:

[('HarmonicBondForce', 1922.0866981837933),
 ('HarmonicAngleForce', 7738.2501663624735),
 ('PeriodicTorsionForce', 2753.918177413638),
 ('PeriodicTorsionForce', 85.69763856896316),
 ('NonbondedForce', 890.8153693526983),
 ('CMMotionRemover', 0.0)]

full ffxml:

[('HarmonicBondForce', 1922.0866981837933),
 ('HarmonicAngleForce', 7738.2569055468175),
 ('PeriodicTorsionForce', 2753.9186167853522),
 ('PeriodicTorsionForce', 89.23233614287892),
 ('NonbondedForce', 890.8153691142797),
 ('CMMotionRemover', 0.0)]

Nothing that immediately pops out to be a problem just playing around with these systems and attributes. I was hoping you might have some suggestions / ideas? The fact that the prmtop-ffxml is neither with aggrement with AMBER or full ffxml tells me there's something more at play than just an AMBER / OpenMM atom order mismatch.

jchodera commented 8 years ago

Wow. AMBER and prmtop-ffxml certainly shouldn't be giving such large discrepancies.

Is this sufficiently automated such that you could go through and zero out one improper) or all impropers but one) at a time to figure out if there is a single term responsible for much of this discrepancy?

rafwiewiora commented 8 years ago

Is this sufficiently automated such that you could go through and zero out one improper) or all impropers but one) at a time to figure out if there is a single term responsible for much of this discrepancy?

Now it finally can! I'm turning off all propers and only turning on one improper at a time - in AMBER and in OpenMM-conversion. All pass the tests...

So it's either some mix up (maybe some overriding conflict) in the ffxml or there's no problems one-by-one in impropers, but they depend on each other and problem shows up in the whole system only? Impropers are simply additive, right?

jchodera commented 8 years ago

The impropers should be additive.

I wonder if this is a combination of (1) tleap making a different choice from OpenMM for most of these impropers and (2) slight deviations from planarity causing the energies to add up to a larger error than usual.

rafwiewiora commented 8 years ago

It was actually a bug in the debugging script... Got it now!

So what I'm doing is:

Total improper energies are (kJ/mol): Amber - 75.0081460522 ffxml- 89.2323361429

Now we have two culprits here:

A

AMBER dihedral: ('H4', 'X', 'CS', 'X') ffxml translation:

<Improper type1="CS" type2="" type3="" type4="H4" periodicity1="2" phase1="3.14159265359" k1="4.6024"/>

Energies (AMBER, ffxml):

('PeriodicTorsionForce', 15.537213444709778)
('PeriodicTorsionForce', 23.092045068740845)

That's pretty bad energy difference - but this is only in 6 impropers in the whole system that fall under these params, all in U and here comes the problem - the order of atoms in the OpenMM system:

<Dihedral [imp]; <Atom C5 [75]; In U 2>--<Atom N1 [72]; In U 2>--<Atom C6 [73]; In U 2>--<Atom H6 [74]; In U 2>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000,  scee=1.000, scnb=1.000>>

Types: CS - N* - CS - H4

AMBER:

<Dihedral [imp]; <Atom N1 [72]; In U 2>--<Atom C5 [75]; In U 2>--<Atom C6 [73]; In U 2>--<Atom H6 [74]; In U 2>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000,  scee=0.000, scnb=0.000>>

Types: N* - CS - CS - H4

In this case, this carbon rule is giving us a result opposite to AMBER.

B Same story, just different carbon atom type. 6 of these in C's. AMBER dihedral: ('H4', 'X', 'C4', 'X') ffxml translation:

  <Improper type1="C4" type2="" type3="" type4="H4" periodicity1="2" phase1="3.14159265359" k1="4.6024"/>

Energies (AMBER, ffxml): (kJ/mol)

('PeriodicTorsionForce', 19.267142295837402)
('PeriodicTorsionForce', 25.95484161376953)

OpenMM order:

<Dihedral [imp]; <Atom C5 [13]; In C5 0>--<Atom N1 [10]; In C5 0>--<Atom C6 [11]; In C5 0>--<Atom H6 [12]; In C5 0>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000,  scee=1.000, scnb=1.000>>

AMBER:

<Dihedral [imp]; <Atom N1 [10]; In C5 0>--<Atom C5 [13]; In C5 0>--<Atom C6 [11]; In C5 0>--<Atom H6 [12]; In C5 0>; type=<DihedralType; phi_k=1.100, per=2, phase=180.000,  scee=0.000, scnb=0.000>>

What can we do about this? The energy differences are big, but do they matter as long as the improper is doing it's job of keeping things in the right planes? How can I look into the rules AMBER uses here and learn why it puts those nitrogens, not carbons, first in these cases?

Pinging @swails and @peastman too.

rafwiewiora commented 8 years ago

These are N - C(sp2) - C(sp2) - H in the C/U rings, so I guess the energy changes on making a mistake here do not surprise.

rafwiewiora commented 8 years ago

Interestingly, I'm seeing two >1 kJ/mol misfits for the DNA system too, but they're in both directions and cancel themselves out just enough to make the improper energies fall into 1% tolerance!

We have: A

('H', 'X', 'N2', 'X')

Energies (AMBER / OpenMM):

('PeriodicTorsionForce', 155.19318262359593)
('PeriodicTorsionForce', 153.32055167027283)

Is this due to a (partial) swap in two hydrogens in the order: AMBER (these are atom names not types):

{('C2', 'H21', 'N2', 'H22'),
 ('C4', 'H41', 'N4', 'H42'),
 ('C6', 'H61', 'N6', 'H62')}

OpenMM (second one is extra)

{('C2', 'H21', 'N2', 'H22'),
 ('C2', 'H22', 'N2', 'H21'),
 ('C4', 'H41', 'N4', 'H42'),
 ('C6', 'H61', 'N6', 'H62')}

B Same story as with the RNA 2 comments above - C first in OpenMM / N first in AMBER, in DC / DT base rings.

('H4', 'X', 'CM', 'X')

Energies (AMBER - OpenMM):

('PeriodicTorsionForce', 10.416644178898423)
('PeriodicTorsionForce', 13.4050743040425)
peastman commented 8 years ago

Here's what @swails said about it at https://github.com/pandegroup/openmm/issues/220#issuecomment-115458262:

Amber improper torsions are sadly done in a way that is not well-defined. The central atom comes third, and the other three are in alphabetical order by atom type. When two atoms involved in the improper torsion have the same type, the ordering becomes undefined, and indeed there are instances where leap will assign two different copies of the same ligand with different orders of impropers that are still consistent according to their sorting rules.

I have plans to go into tleap and order by atom type, then order degenerate atoms by atom name, but I haven't done this yet. atom name should have been done from the beginning, since that choice does not suffer from undefined behavior (all atom names in a residue must be unique), and if you change the atom types as part of refactoring a force field, you might unknowingly change the ordering of the impropers.

At the end of the day, these torsions are not rigorously derived, anyway -- they just exist to keep things planar. And they'll do that even if the order isn't exactly the same between programs. So I don't think it's a big deal.

jchodera commented 8 years ago

And they'll do that even if the order isn't exactly the same between programs. So I don't think it's a big deal.

Well, 14 kJ/mol is almost 6 kT, which is a factor of a 400 in population of those structures.

I think we should probably put a big warning in the manual at this point about the fact that LEaP does not unambiguously define an improper ordering, requiring users who want exact agreement with LEaP to go through the LEaP route and load in via AmberPrmtopFile/AmberInpcrdFile.

Thanks for tracking this down, @rafwiewiora!

swails commented 8 years ago

Notice that the behavior I described isn't really the same as what @rafwiewiora reported here. The ordering of CS, N*, and H4 should not be ambiguous. There's something else happening here, but I don't know what it is. I also admit that 14 kJ/mol is a lot more than I would expect. Of course this difference can be very misleading, as it could be (and likely is, to a large extent) basically a shift in the potential energy surface. But still worth further investigation.

rafwiewiora commented 8 years ago

Can we track this down further into the tleap code?

I see the sorting here, but this is only in the parameter set, before assignment for the system, right? (not sure if I'm using the right nomenclature for LeAP here)

So what decides on the order the two wildcards get assigned to actual atoms? I've been looking through the code, but I don't know C well enough to work it out. Can someone help out?

I suppose that will be the place where these rules: https://github.com/pandegroup/openmm/blob/master/wrappers/python/simtk/openmm/app/forcefield.py#L1378-L1388 came from.

rafwiewiora commented 8 years ago

Also, I want to clarify what 'central atom' means. That makes sense for an atom bound to 3 other atoms, but e.g. the problematic impropers in RNA are a linear chain of N-C-C-H - what defines the central atom there? And why is that not just defined as a proper?

swails commented 8 years ago

Ack, I don't know anymore :(

peastman commented 8 years ago

the problematic impropers in RNA are a linear chain of N-C-C-H - what defines the central atom there?

That's exactly the definition of a proper torsion! So if it's being called an improper somewhere, that seems like a bug.

swails commented 8 years ago

I think that impropers can actually be assigned to propers in some places... :( I haven't looked at the logic for how LEaP assigns impropers in a ton of detail, but from what I've heard, it's weird.

rafwiewiora commented 8 years ago

That's exactly the definition of a proper torsion! So if it's being called an improper somewhere, that seems like a bug.

Sorry, my bad! I've looked in Pymol wrongly - it's fine, it's a central C with the three atoms bound to it. I realized there was no way OpenMM would assign this as an improper if it wasn't an improper, as it looks at bonding when constructing system data.

rafwiewiora commented 8 years ago

Looking for solutions to the impropers issue here now.

I confused myself a bit before with this one improper in DNA & RNA, so to re-summarize:

YET

So this is looking like a bug in LeAP, I can't work out where it's coming from just yet - but this would potentially mean the converted forcefields might be unusable for nucleic acids until this is resolved. @swails could you point me to someone in the AMBER team I could speak to about this?

rafwiewiora commented 8 years ago

pinging @swails and @jchodera with a reminder here, see above

swails commented 8 years ago

I'm confused... It looks like there's a CM in the third position of both the improper with wild-cards and in the one that LEaP assigns. What am I missing? That looks consistent with what LEaP should be doing, assuming that N* is in the residue before CM, CM, and H4.

As for whom to contact, the Amber list, or perhaps the Amber developer list, is the best bet.

rafwiewiora commented 8 years ago

OK, I think I've confused myself really badly and you only helped me rethink that with your comment. I think I thought too much about how the 3rd atom == central becomes 1st atom == central in ffxml, but then again 3rd atom == central on assigning the torsions on createSystem. And because OpenMMParameterSet has central atom == 1st atom I assumed somewhere along the way that every improper I get out of ParmEd would use that convention. But now I finally understand that if I make a Structure the order is as in the original input, only if I make a ParameterSet - this https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L75 filtering applies.

Ok, so no bug in LeAP, this is just the order of assignment of the other 3 atoms issue. I have an idea to address this.

Btw, this function https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L75 has misleading docs. You say in the docstring:

 The default is to *always* put the central atom first, and assume it
    comes first already if no central atom is detected. 

but the default is actually to assume the third atom is central https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L119

swails commented 8 years ago

Btw, this function https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L75 has misleading docs. You say in the docstring:

 The default is to *always* put the central atom first, and assume it
    comes first already if no central atom is detected. 

but the default is actually to assume the third atom is central https://github.com/ParmEd/ParmEd/blob/master/parmed/utils/__init__.py#L119

I clarified the docs here a little bit. (See ParmEd/ParmEd#650). Note that that assumption (of third atom being central) is only hit if the structure has no bonds defined (which is only ever the case when starting from a PDB or mmCIF file). In every other case, the central atom is identified as the one forming bonds with the other 3. I added a note about having that section of the code use the center parameter to assign the central atom in the case where it's ambiguous.

But to make things even more confusing, CHARMM always puts the central atom first, AMBER always puts it third (except in the rare case when it needs to be "flipped" because the first atom in the system is a central atom, then the central atom becomes second). But ParmEd, like OpenMM, needs to handle both cases...

EDIT: fixed PR reference to ParmEd repository.

rafwiewiora commented 8 years ago

Looks good! (your links link to this repo instead of ParmEd though :) ). It looks like the central atom in there will not be defined in any case where you don't have one atom forming bonds with the other 3 right? So because I was suspecting a bug out of LeAP I made sure I knew that function in case the LeAP bug was giving some impropers with not the right bonds.

I think what would've been helpful to avoid my confusion with this would be an attribute on the Dihedral that gives the index of the central atom if improper = True.