choderalab / openmm

OpenMM is a toolkit for molecular simulation using high performance GPU code.
1 stars 0 forks source link

Update // ions & solvents decision #13

Closed rafwiewiora closed 8 years ago

rafwiewiora commented 8 years ago

@peastman @jchodera @swails

tl:dr: All protein/nucleic converted, some issues outstanding that could bring in a few more FFs, some FFs never to be converted - all explained below. See ions & solvents section for decisions on how to distribute those.

Happy to report that all forcefields that are convertable right now, except for ions and solvents, have been converted and validated - all files here (files/ keeps files for tests, ffxml/ is the converted xmls). Perhaps you could have a first quick reviewing look?

In summary the script uses the master.yaml for list of files to convert / provenance info / tests to run, converts all of the files and validates through specified tests:

The relative (to AMBER energy) energy differences allowed are 1e-5 for all forces, except for impropers (dihedrals are split) and 1e-2 for impropers - except for one protein forcefield that needed to go up to 2e-2 on impropers and nucleic acids for which the improper testing is turned off for now, as I'm thinking what to do with the impropers problems there.

I've converted all the FFs in AmberTools 15 that I can convert within existing code and problems, and these are the ones that can't be converted:

EDIT: * also oldff/leaprc.ff94.nmr - leaprc does not load n- and c-terminal protein residues.

TO DO: clean up the script, a lot of repeating code now

Ions & solvents

One question here - do we want water-ions bundles where possible or completely separate waters and ions?

In AMBER I see:

So I think we can do either:

A Waters & ions together in ffxmls - perhaps there should be a default one (I don't even know which set params for Na+ and Cl- in the currently distributed ones come from? Would that be Joung-Chetham?).

B Separate

I don't see particularly big differences to be fair, it's looking like either 3x5=15 files for option A and 15+3 for option B. Perhaps B is a bit more dangerous in that people could load in wrong combinations and burdensome in that you have to think about which ions to load in even though you're just doing simple ions for charge neutralization - this is why I think we should also have 'default' sets if option A.

I will do all with the 'new naming scheme' of atomic_ions.lib (i.e. we're leaving ions94.lib and especially ions91.lib behind) + parm10.dat for a few 'general' parameters that sit there.

As for solvents I'll need to work out how to deal with virtual sites etc.

peastman commented 8 years ago

does OpenMM have a sense of directionality for residues (head/tail) and can we just have one external bond for the acyl chains?

I'm not sure what you mean by that. Could you explain?

rafwiewiora commented 8 years ago

I'm not sure what you mean by that. Could you explain?

@swails perhaps can explain in more detail, but my understanding from what he said is that LeAP cares about directionality in external bonds in residues, i.e. it will specifically mark an atom with an external bond a 'head' or a 'tail' depending on the directionality of the polymer. And this becomes a problem in the Lipid11 / Lipid14 forcefields, where the acyl chains are connected to the central residue just by one external bond - the atom that does a connection can be either a head or a tail in LeAP nomenclature - this is actually what I don't fully understand why @swails, I'm guessing the definition of directionality in a lipid can change also depending on something else and hence we need both head and tail?

Now, as the ParmEd OpenMM writing script was, it treated this both-head-and-tail info as two external bonds in the same atom. I just corrected it here: https://github.com/ParmEd/ParmEd/pull/623 which has been merged already, but we just wanted to make sure that OpenMM won't choke somewhere on those lipids if it also had some directionality recognition in the ForceField - but I don't think it does, all it cares about is the residue template signature and which atoms form external bonds right? Doesn't matter what comes before it, what after and in what direction?

Can we decide on the water/ions too please?

peastman commented 8 years ago

all it cares about is the residue template signature and which atoms form external bonds right?

Correct. There's no directionality to an <ExternalBond>. It doesn't care which other residue the bond connects to.

rafwiewiora commented 8 years ago

Reposting the ions and solvents question because there was some confusion - @peastman @jchodera

Ions & solvents

One question here - do we want water-ions bundles where possible or completely separate waters and ions?

In AMBER I see:

So I think we can do either:

A Waters & ions together in ffxmls - perhaps there should be a default one (I don't even know which set params for Na+ and Cl- in the currently distributed ones come from? Would that be Joung-Chetham?).

B Separate

I don't see particularly big differences to be fair, it's looking like either 3x5=15 files for option A and 15+3 for option B. Perhaps B is a bit more dangerous in that people could load in wrong combinations and burdensome in that you have to think about which ions to load in even though you're just doing simple ions for charge neutralization - this is why I think we should also have 'default' sets if option A.

I will do all with the 'new naming scheme' of atomic_ions.lib (i.e. we're leaving ions94.lib and especially ions91.lib behind) + parm10.dat for a few 'general' parameters that sit there.

As for solvents I'll need to work out how to deal with virtual sites etc.

peastman commented 8 years ago

I was waiting to see what @swails would say. He knows a lot more about those files than I do.

rafwiewiora commented 8 years ago

Alright!

swails commented 8 years ago

I'm not sure what you were waiting on me for, but I'll assume it was in relation to the ions and water models (and not the lipid. My vote is to bundle joint water/ion parameters together. It's unclear to me what the best ion model is for divalent ions (monovalent ions I think the Joung-Cheatham parameters are the way to go). But with the divalent ions, you can either get HFE right, or you can get ion-oxygen distance right; but not both. Or you can get them both wrong, but neither horrible (a compromise between the two). And then of course is the 12-6-4 model which is supposed to fix all the problems.

I would vote for doing one of two things:

  1. Pick the best one (I think the compromise model is the suggested one) and bundle that one with the water model, so you have an XML file for each water model + monovalent ion parameters + divalent ion parameters (giving 3 total XML files for SPCE, TIP4Pew, and TIP3P).
  2. Provide separate XML files for each of the available divalent ion parameter sets that also both include the relevant water model and monovalent ion parameters.

Since there's basically no ambiguity as to which monovalent ion parameter should be chosen for each water model, I see no reason not to bundle them all together. At least that's my take on the whole thing.

jchodera commented 8 years ago

I'd vote for a scheme that enables us to do some sensible sensitivity analysis comparisons of which good combinations might be best for binding free energy calculations.

peastman commented 8 years ago

Since we're planning to allow an overriding mechanism, what about this? Bundle the recommended ion parameters with each water model, so you get those automatically. Then provide separate files with only the divalent ions, which can be added in to override the default parameters for them?

jchodera commented 8 years ago

Since we're planning to allow an overriding mechanism, what about this? Bundle the recommended ion parameters with each water model, so you get those automatically. Then provide separate files with only the divalent ions, which can be added in to override the default parameters for them?

Works for me!

rafwiewiora commented 8 years ago

Sounds good!

Out of curiosity / want to understand fully: how do you justify only providing one monovalent model in all this / why is Joung-Cheatham the best one?

Also what about trivalent and tetravalent parameters - have these separately, or add to those divalent separate files?

rafwiewiora commented 8 years ago

Another question is: for the divalent ions that will override the standard divalent ions - we need to change the atom type names there - what kind of change should we introduce?

I would suggest including the 'priority' number that the overriding template will have - so standard zinc would have atom type ZN (or ZN1 but I'm thinking about making the overriding so that unspecified priority will mean 1) and the overriding will be ZN2, then perhaps the dummy atom model would have ZN3 or ZN2. What do you think?

peastman commented 8 years ago

What about something more descriptive, that indicates what parameter set it comes from? Similar to how different water models define types like "tip3p-O", "tip4pew-O", etc.

jchodera commented 8 years ago

How fancy do we get? charmm-tip3p-O and amber-tip3p-O?

rafwiewiora commented 8 years ago

Yeah, I feel like this can get too complicated depending on the amount of information you put into those names. At the same time, all that information sits already in the file name, so why repeat?

I'm +1 for something as simple as possible, while giving different names. That would mean for the new files also changing the type names for waters to just OH and HW - I don't see any point for overriding there.

rafwiewiora commented 8 years ago

On a separate note: Do we have capability in OpenMM for the 12-6-4 potential? Does that have to go as a custom force?

peastman commented 8 years ago

I don't think there's any reason to go that far, unless the CHARMM and Amber files actually define TIP3P differently. As long as they really are identical types, we can use the same name in both places.

There's definitely value to having the more verbose names, though, in cases where they really are different. I wouldn't want to remove that. And that's especially true in cases like the ions, where you expect to be loading two different versions in a single ForceField object.

jchodera commented 8 years ago

I don't think there's any reason to go that far, unless the CHARMM and Amber files actually define TIP3P differently.

They do.

peastman commented 8 years ago

We could certainly implement a 12-6-4 potential with CustomNonbondedForce. But are we considering any force fields for this release that use that?

peastman commented 8 years ago

They do.

Then the names should be different. But what's the difference between them? TIP3P is supposed to be a completely standardized model.

jchodera commented 8 years ago

There's definitely value to having the more verbose names, though, in cases where they really are different. I wouldn't want to remove that. And that's especially true in cases like the ions, where you expect to be loading two different versions in a single ForceField object.

You wouldn't want to be loading two different versions of the same kind of water model or solvated monovalent/divalent ions into the same ForceField object. The case we talked about previously where this would be desired is pretty much limited to mixing multisite ion models with non-multisite models.

peastman commented 8 years ago

You wouldn't want to be loading two different versions of the same kind of water model or solvated monovalent/divalent ions into the same ForceField object.

That's exactly what we're planning to do. The water models will include default parameters for all ions, and then you'll optionally be able to load additional files that override the parameters for limited sets of ions. In that case, it will be helpful to be able to check the atom type assignments and verify that it's using the ones you intended.

jchodera commented 8 years ago

In that case, it will be helpful to be able to check the atom type assignments and verify that it's using the ones you intended.

How? There's currently no facility for retaining information about what atom types are matched, nor is there any way for someone to examine this through an API call.

peastman commented 8 years ago

Not currently, no, because the whole concept of overriding doesn't exist yet.

swails commented 8 years ago

Then the names should be different. But what's the difference between them? TIP3P is supposed to be a completely standardized model.

CHARMM doesn't claim to use TIP3P -- they call it TIP3PF or something. My understanding is that they put small LJ terms on the Hs. But yes, I've heard they're different.

We could certainly implement a 12-6-4 potential with CustomNonbondedForce. But are we considering any force fields for this release that use that?

This is an ion model more than a FF. The only terms that have a r^-4 term are interactions between atom types and ions. So in the absence of "12-6-4 ions", the FF is a canonical 12-6 L-J potential. Currently this is only supported through the Amber prmtop route (and OpenMM is the only GPU implementation of 12-6-4 to date -- pmemd.cuda does not currently support it). It's also the only implementation of the 12-6-4 model that correctly includes the dispersion correction :).

But I think @peastman is right here -- we don't need to have overrides and priorities for parameters. Each XML file defines residue templates of ions with different atom type names, and those templates will override the previous ones (as is already the case, I think?) and simply make ForceField look for the corresponding atom type in that XML file. But maybe I'm misunderstanding.

jchodera commented 8 years ago

But I think @peastman is right here -- we don't need to have overrides and priorities for parameters. Each XML file defines residue templates of ions with different atom type names, and those templates will override the previous ones (as is already the case, I think?) and simply make ForceField look for the corresponding atom type in that XML file. But maybe I'm misunderstanding.

I believe the residue matching behavior is still undefined if we have multiple residues with the same element-bond graphs. @peastman can clarify.

rafwiewiora commented 8 years ago

My understanding is that they put small LJ terms on the Hs

Exactly - they have LJ on the H's, all other params are the same as AMBER tip3p.

This is an ion model more than a FF. The only terms that have a r^-4 term are interactions between atom types and ions. So in the absence of "12-6-4 ions", the FF is a canonical 12-6 L-J potential.

We could certainly implement a 12-6-4 potential with CustomNonbondedForce. But are we considering any force fields for this release that use that?

Yeah, I was only asking because of the 12-6-4 ions. Not sure how much work that would take to implement, but I'd say let's finish this all off without those first and think about the implementation in the future?

But I think @peastman is right here -- we don't need to have overrides and priorities for parameters. Each XML file defines residue templates of ions with different atom type names, and those templates will override the previous ones (as is already the case, I think?) and simply make ForceField look for the corresponding atom type in that XML file. But maybe I'm misunderstanding

I believe the residue matching behavior is still undefined if we have multiple residues with the same element-bond graphs. @peastman can clarify

I think we have this all worked out ok already - yes, the residue overriding is undefined yet, will be defined with the overriding. And exactly - we're going to have an overriding residue template that uses atom types with new names that sit in the same ffxml.

Sounds like @peastman likes the more verbose type names for checking whether the correct ones were loaded - fair enough, but that also made me think it is easier to do that by checking the name of the residue template that's the one in use after all the overriding - do we have separate residue names too? Looking at the current waters all of them have the HOH name. So to summarize and make the final decision:

tip3p_standard.xml
tip4pew_standard.xml
spce_standard.xml
tip3p_HFE_multivalent.xml
tip3p_IOD_multivalent.xml
tip4pew_HFE_multivalent.xml
tip4pew_IOD_multivalent.xml
spce_HFE_multivalent.xml
spce_IOD_multivalent.xml

The 'standard' has the solvent + JC monovalent ions. The multivalent HFE/IOD have divalent that will override + 3/4-valent ions from the appropriate sets.

Are these good names? Then we can have for atom types name_as_above-atom_(type)_name. Same questions here - atom type name or atom name in the residue to go into those type names? And residues keep the same names or add those differentiating names to them too?

rafwiewiora commented 8 years ago

Actually, thinking about the residue names more - we need something to mean 'this is the residue to override with that one' in the overriding, so it seems like we should keep the residue names the same and override based on the same residue name? Otherwise we would need a function that does signature and bonding comparisons on two templates rather than template - residue and that might get a bit more messy.

rafwiewiora commented 8 years ago

Another question, sorry these things multiply fast as I work through:

HW-HW-OW      0.      127.74    (found in crystallographic water with 3 bonds)

I don't understand what this is for - a hydrogen bonding angle? The k is zero anyway. @swails should we code not-writing to ffxml of terms with zero k's into ParmEd?

peastman commented 8 years ago

CHARMM doesn't claim to use TIP3P -- they call it TIP3PF or something.

Ok. I'd say that anything that is defining standard TIP3P can continue to use the existing names, like "tip3p-O". But if the CHARMM version isn't standard TIP3P, it should use different names, like "tip3pf-O" or "charmm-tip3p-O".

Actually, thinking about the residue names more - we need something to mean 'this is the residue to override with that one' in the overriding, so it seems like we should keep the residue names the same and override based on the same residue name?

That makes sense to me. Especially because the overriding template might have a different bond graph than the original one. That's certainly true for multi-site ion models. The standard Zn template will be just a single atom. The overriding one will add in some extra particles and bonds. And it needs to replace the standard one, such that when you call addExtraParticles() it will add the particles.

AMBER water models come with the hydrogen - hydrogen harmonic bond restraint. Clearly this is replaced by putting on constraints when creating system in OpenMM as we don't use those in the existing models, correct?

The current water models actually include this too. Even though most water models are officially defined to be rigid, it's sometimes useful to represent them as flexible. For example, energy minimization is a lot easier without the constraints. When you call createSystem(), it automatically replaces those bonds with constraints unless you specify rigidWater=False.

rafwiewiora commented 8 years ago

The current water models actually include this too. Even though most water models are officially defined to be rigid, it's sometimes useful to represent them as flexible. For example, energy minimization is a lot easier without the constraints. When you call createSystem(), it automatically replaces those bonds with constraints unless you specify rigidWater=False.

I'm confused here. Are you saying the current water models have H - H bonds? They don't in the XMLs, doesn't look like they are hardcoded anywhere - topology doesn't have those bonds.

I see how createSystem() replaces H - O bonds and the angle with constraints for residues named HOH, but nothing on the H - H bonding.

peastman commented 8 years ago

No, I misunderstood what you were saying. It includes O-H bonds, and an H-O-H angle.

rafwiewiora commented 8 years ago

Right, I see.

Just to make sure I understand right - you should not run a simulation with unconstrained waters using a model that is defined as rigid. So the only usefulness of keeping these models flexible would be to first createSystem() with rigidWater=False, create simulation, minimize energy - copy the positions over to a new simulation where rigidWater = True was used when creating system?

Hence the bond and angle parameters are only useful for energy minimization. The same must be the case in AMBER. Why then does AMBER chose to impose this additional H-H bond for flexible use of these models and OpenMM does not? Does that introduce any differences in the minimization and final geometries? Tagging @swails here too.

peastman commented 8 years ago

I wouldn't go so far as to say that's the only use, but it's certainly one of the most common. It's certainly true, though, that most standard water models are defined to be rigid. So if you ask for TIP3P but set rigidWater=False, you're getting something different from "standard" TIP3P. Because standard TIP3P is rigid. I think the only non-rigid water models we currently include are for AMOEBA.

rafwiewiora commented 8 years ago

Ok, thanks!

(Deleted some comment here, need to think about it more).

rafwiewiora commented 8 years ago

Ok, so what I was thinking was - what algorithm does OpenMM implement constraints with? And what exactly would be the difference between having a water molecule with atoms in non-equilibrium positions and a) minimizing it as flexible b) minimizing it as rigid? Would b) be calculated using SHAKE / whatever else OpenMM uses for constraints or would it exchange constraints for harmonic bonds during minimization?

I'm wondering this because it seems like AMBER would use those H-H bonds for faster minimization perhaps? @ChayaSt told me CHARMM used these bonds too and says in its files it's 'for SHAKE'. But I don't understand what it changes for SHAKE whether you add another bond - isn't the only input into SHAKE the constrained positions of the 3 particles?

peastman commented 8 years ago

OpenMM uses a mix of three different constraint algorithms: SETTLE for water, SHAKE for clusters of hydrogens bound to a heavy atom, and CCMA for everything else.

And what exactly would be the difference between having a water molecule with atoms in non-equilibrium positions and a) minimizing it as flexible b) minimizing it as rigid?

In case b, all bonds and angles would end up having precisely the values they're constrained to. That's true regardless of what forces are acting on the molecule. That isn't true in case a. For an isolated water molecule with no outside forces acting on it, the values would come out the same (because they also happen to be the minimum energy configuration), but if there's anything else in the system interacting with the molecule, that will alter its geometry.

@ChayaSt told me CHARMM used these bonds too and says in its files it's 'for SHAKE'.

I'd guess what that means is that CHARMM only knows how to replace bonds by constraints, not angles. So if you want the H-H distance to be constrained, you have to include a bond there. OpenMM can add constraints for angles, so that's how it knows to add a third constraint.

swails commented 8 years ago

One thing I'll point out is look how angle constraints in OpenMM are implemented -- they are converted into distances via standard geometry, right? So how is that different from just using a bond instead that defines the same geometry for water?

For others it might differ because the constraint algo is not exact, but SETTLE is so this should make no difference. Right?

Last comment, given that Amber and charmm are by far the oldest packages here I suspect that the original definition of the rigid waters had three bonds rather than two bonds and one angle. But maybe not :)

rafwiewiora commented 8 years ago

Thanks @peastman!

Ok, so going onto the final shape of this - another problem is the virtual sites. I'm currently trying to understand the translation from the EP with bonds to the oxygen that AMBER uses to the OpenMM virtual site - particularly how the weights for the virtual sites are calculated?

When you made the existing water ffxmls - was this a conversion from AMBER? Is there any info on the questions above sitting in an issue/PR somewhere?

I'm not sure if this is a trivial thing to do with the virtual sites? Should I just manually copy the existing ffxml's into the newly converted ion xmls instead?

There's also the question of oxygen and hydrogen masses - these are exact in the existing xmls, but Amber parm10.dat has them less exact. This was the same thing with the protein/nucleic forcefields, where we decided to go with the inexact Amber masses because that's what we're reproducing. With waters, should we change to using the less exact masses too? Is this an AMBER reproduction too or are waters more of a general case? I'm not sure what the original implementation software for say tip3p was.

peastman commented 8 years ago

particularly how the weights for the virtual sites are calculated?

I just looked up the definitions of the water models and calculated the weights by hand. Depending on how AMBER represents them (I don't know), it might or might not be straightforward to convert them. If it isn't, just copying the existing definitions might be the safest thing.

There's also the question of oxygen and hydrogen masses - these are exact in the existing xmls, but Amber parm10.dat has them less exact.

This probably makes very little difference. How much difference is there in the definitions?

rafwiewiora commented 8 years ago

I just looked up the definitions of the water models and calculated the weights by hand. Depending on how AMBER represents them (I don't know), it might or might not be straightforward to convert them. If it isn't, just copying the existing definitions might be the safest thing.

Ok!

This probably makes very little difference. How much difference is there in the definitions? Yeah not a big difference at all.

The existing files have:

<Type name="tip3p-O" class="OW" element="O" mass="15.99943"/>
 <Type name="tip3p-H" class="HW" element="H" mass="1.007947"/>

AMBER params are:

<Type name="HW" class="HW" element="H" mass="1.008"/>
<Type name="OW" class="OW" element="O" mass="16.0"/>
rafwiewiora commented 8 years ago

@swails can you help out on the virtual sites weights calculations (i.e. how do I translate the AMBER models that use EP's with EP-O bonds to the OpenMM virtual sites)? I mentioned this briefly during the Skype we had, I think you said that would be sitting somewhere in ParmEd?

peastman commented 8 years ago

I don't think there are many cases where anyone would care about a 0.01% difference in the mass.

rafwiewiora commented 8 years ago

I have decided to not worry about coding up the virtual sites conversion - we will convert solvents by hand and the conversion script accepts ready solvent ffxml's, which it appends to the ion ffxmls during conversion. Closing this.