Closed rafwiewiora closed 8 years ago
does OpenMM have a sense of directionality for residues (head/tail) and can we just have one external bond for the acyl chains?
I'm not sure what you mean by that. Could you explain?
I'm not sure what you mean by that. Could you explain?
@swails perhaps can explain in more detail, but my understanding from what he said is that LeAP cares about directionality in external bonds in residues, i.e. it will specifically mark an atom with an external bond a 'head' or a 'tail' depending on the directionality of the polymer. And this becomes a problem in the Lipid11 / Lipid14 forcefields, where the acyl chains are connected to the central residue just by one external bond - the atom that does a connection can be either a head or a tail in LeAP nomenclature - this is actually what I don't fully understand why @swails, I'm guessing the definition of directionality in a lipid can change also depending on something else and hence we need both head and tail?
Now, as the ParmEd OpenMM writing script was, it treated this both-head-and-tail info as two external bonds in the same atom. I just corrected it here: https://github.com/ParmEd/ParmEd/pull/623 which has been merged already, but we just wanted to make sure that OpenMM won't choke somewhere on those lipids if it also had some directionality recognition in the ForceField - but I don't think it does, all it cares about is the residue template signature and which atoms form external bonds right? Doesn't matter what comes before it, what after and in what direction?
Can we decide on the water/ions too please?
all it cares about is the residue template signature and which atoms form external bonds right?
Correct. There's no directionality to an <ExternalBond>
. It doesn't care which other residue the bond connects to.
Reposting the ions and solvents question because there was some confusion - @peastman @jchodera
One question here - do we want water-ions bundles where possible or completely separate waters and ions?
In AMBER I see:
then we have frcmod's for all solvents separately and all ions separately - please see p. 45, 46 of the AMBER manual.
They have: 1) Joung/Cheatham, 2) Li/Merz et al. HFE set 3) Li/Merz et al. IOD set 4) Li/Merz et al. CM set 5) Li/Merz et al. 12-6-4 set - each for tip3p/tip4pew/spce
So I think we can do either:
A Waters & ions together in ffxmls - perhaps there should be a default one (I don't even know which set params for Na+ and Cl- in the currently distributed ones come from? Would that be Joung-Chetham?).
B Separate
I don't see particularly big differences to be fair, it's looking like either 3x5=15 files for option A and 15+3 for option B. Perhaps B is a bit more dangerous in that people could load in wrong combinations and burdensome in that you have to think about which ions to load in even though you're just doing simple ions for charge neutralization - this is why I think we should also have 'default' sets if option A.
I will do all with the 'new naming scheme' of atomic_ions.lib (i.e. we're leaving ions94.lib and especially ions91.lib behind) + parm10.dat for a few 'general' parameters that sit there.
As for solvents I'll need to work out how to deal with virtual sites etc.
I was waiting to see what @swails would say. He knows a lot more about those files than I do.
Alright!
I'm not sure what you were waiting on me for, but I'll assume it was in relation to the ions and water models (and not the lipid. My vote is to bundle joint water/ion parameters together. It's unclear to me what the best ion model is for divalent ions (monovalent ions I think the Joung-Cheatham parameters are the way to go). But with the divalent ions, you can either get HFE right, or you can get ion-oxygen distance right; but not both. Or you can get them both wrong, but neither horrible (a compromise between the two). And then of course is the 12-6-4 model which is supposed to fix all the problems.
I would vote for doing one of two things:
Since there's basically no ambiguity as to which monovalent ion parameter should be chosen for each water model, I see no reason not to bundle them all together. At least that's my take on the whole thing.
I'd vote for a scheme that enables us to do some sensible sensitivity analysis comparisons of which good combinations might be best for binding free energy calculations.
Since we're planning to allow an overriding mechanism, what about this? Bundle the recommended ion parameters with each water model, so you get those automatically. Then provide separate files with only the divalent ions, which can be added in to override the default parameters for them?
Since we're planning to allow an overriding mechanism, what about this? Bundle the recommended ion parameters with each water model, so you get those automatically. Then provide separate files with only the divalent ions, which can be added in to override the default parameters for them?
Works for me!
Sounds good!
Out of curiosity / want to understand fully: how do you justify only providing one monovalent model in all this / why is Joung-Cheatham the best one?
Also what about trivalent and tetravalent parameters - have these separately, or add to those divalent separate files?
Another question is: for the divalent ions that will override the standard divalent ions - we need to change the atom type names there - what kind of change should we introduce?
I would suggest including the 'priority' number that the overriding template will have - so standard zinc would have atom type ZN
(or ZN1
but I'm thinking about making the overriding so that unspecified priority will mean 1) and the overriding will be ZN2
, then perhaps the dummy atom model would have ZN3
or ZN2
. What do you think?
What about something more descriptive, that indicates what parameter set it comes from? Similar to how different water models define types like "tip3p-O", "tip4pew-O", etc.
How fancy do we get? charmm-tip3p-O
and amber-tip3p-O
?
Yeah, I feel like this can get too complicated depending on the amount of information you put into those names. At the same time, all that information sits already in the file name, so why repeat?
I'm +1 for something as simple as possible, while giving different names. That would mean for the new files also changing the type names for waters to just OH
and HW
- I don't see any point for overriding there.
On a separate note: Do we have capability in OpenMM for the 12-6-4 potential? Does that have to go as a custom force?
I don't think there's any reason to go that far, unless the CHARMM and Amber files actually define TIP3P differently. As long as they really are identical types, we can use the same name in both places.
There's definitely value to having the more verbose names, though, in cases where they really are different. I wouldn't want to remove that. And that's especially true in cases like the ions, where you expect to be loading two different versions in a single ForceField object.
I don't think there's any reason to go that far, unless the CHARMM and Amber files actually define TIP3P differently.
They do.
We could certainly implement a 12-6-4 potential with CustomNonbondedForce. But are we considering any force fields for this release that use that?
They do.
Then the names should be different. But what's the difference between them? TIP3P is supposed to be a completely standardized model.
There's definitely value to having the more verbose names, though, in cases where they really are different. I wouldn't want to remove that. And that's especially true in cases like the ions, where you expect to be loading two different versions in a single ForceField object.
You wouldn't want to be loading two different versions of the same kind of water model or solvated monovalent/divalent ions into the same ForceField
object. The case we talked about previously where this would be desired is pretty much limited to mixing multisite ion models with non-multisite models.
You wouldn't want to be loading two different versions of the same kind of water model or solvated monovalent/divalent ions into the same ForceField object.
That's exactly what we're planning to do. The water models will include default parameters for all ions, and then you'll optionally be able to load additional files that override the parameters for limited sets of ions. In that case, it will be helpful to be able to check the atom type assignments and verify that it's using the ones you intended.
In that case, it will be helpful to be able to check the atom type assignments and verify that it's using the ones you intended.
How? There's currently no facility for retaining information about what atom types are matched, nor is there any way for someone to examine this through an API call.
Not currently, no, because the whole concept of overriding doesn't exist yet.
Then the names should be different. But what's the difference between them? TIP3P is supposed to be a completely standardized model.
CHARMM doesn't claim to use TIP3P -- they call it TIP3PF or something. My understanding is that they put small LJ terms on the Hs. But yes, I've heard they're different.
We could certainly implement a 12-6-4 potential with CustomNonbondedForce. But are we considering any force fields for this release that use that?
This is an ion model more than a FF. The only terms that have a r^-4 term are interactions between atom types and ions. So in the absence of "12-6-4 ions", the FF is a canonical 12-6 L-J potential. Currently this is only supported through the Amber prmtop route (and OpenMM is the only GPU implementation of 12-6-4 to date -- pmemd.cuda does not currently support it). It's also the only implementation of the 12-6-4 model that correctly includes the dispersion correction :).
But I think @peastman is right here -- we don't need to have overrides and priorities for parameters. Each XML file defines residue templates of ions with different atom type names, and those templates will override the previous ones (as is already the case, I think?) and simply make ForceField look for the corresponding atom type in that XML file. But maybe I'm misunderstanding.
But I think @peastman is right here -- we don't need to have overrides and priorities for parameters. Each XML file defines residue templates of ions with different atom type names, and those templates will override the previous ones (as is already the case, I think?) and simply make ForceField look for the corresponding atom type in that XML file. But maybe I'm misunderstanding.
I believe the residue matching behavior is still undefined if we have multiple residues with the same element-bond graphs. @peastman can clarify.
My understanding is that they put small LJ terms on the Hs
Exactly - they have LJ on the H's, all other params are the same as AMBER tip3p.
This is an ion model more than a FF. The only terms that have a r^-4 term are interactions between atom types and ions. So in the absence of "12-6-4 ions", the FF is a canonical 12-6 L-J potential.
We could certainly implement a 12-6-4 potential with CustomNonbondedForce. But are we considering any force fields for this release that use that?
Yeah, I was only asking because of the 12-6-4 ions. Not sure how much work that would take to implement, but I'd say let's finish this all off without those first and think about the implementation in the future?
But I think @peastman is right here -- we don't need to have overrides and priorities for parameters. Each XML file defines residue templates of ions with different atom type names, and those templates will override the previous ones (as is already the case, I think?) and simply make ForceField look for the corresponding atom type in that XML file. But maybe I'm misunderstanding
I believe the residue matching behavior is still undefined if we have multiple residues with the same element-bond graphs. @peastman can clarify
I think we have this all worked out ok already - yes, the residue overriding is undefined yet, will be defined with the overriding. And exactly - we're going to have an overriding residue template that uses atom types with new names that sit in the same ffxml.
Sounds like @peastman likes the more verbose type names for checking whether the correct ones were loaded - fair enough, but that also made me think it is easier to do that by checking the name of the residue template that's the one in use after all the overriding - do we have separate residue names too? Looking at the current waters all of them have the HOH
name. So to summarize and make the final decision:
atom type: solvent_model_name-atom_name
+ all residues HOH
. Do we want to change the atom_name part to atom type name in that naming scheme - i.e instead of tip3p-H
and tip3p-O
have tip3p-HW
and tip3p-OW
to be somewhat more consistent with AMBER + it might be a bit less confusing from the point of view of my script? And do we want to keep the residue names the same or introduce different ones for better debugging of what template is loaded? tip3p_standard.xml
tip4pew_standard.xml
spce_standard.xml
tip3p_HFE_multivalent.xml
tip3p_IOD_multivalent.xml
tip4pew_HFE_multivalent.xml
tip4pew_IOD_multivalent.xml
spce_HFE_multivalent.xml
spce_IOD_multivalent.xml
The 'standard' has the solvent + JC monovalent ions. The multivalent HFE/IOD have divalent that will override + 3/4-valent ions from the appropriate sets.
Are these good names? Then we can have for atom types name_as_above-atom_(type)_name
. Same questions here - atom type name or atom name in the residue to go into those type names? And residues keep the same names or add those differentiating names to them too?
Actually, thinking about the residue names more - we need something to mean 'this is the residue to override with that one' in the overriding, so it seems like we should keep the residue names the same and override based on the same residue name? Otherwise we would need a function that does signature and bonding comparisons on two templates rather than template - residue and that might get a bit more messy.
Another question, sorry these things multiply fast as I work through:
HW-HW-OW 0. 127.74 (found in crystallographic water with 3 bonds)
I don't understand what this is for - a hydrogen bonding angle? The k is zero anyway. @swails should we code not-writing to ffxml of terms with zero k's into ParmEd?
CHARMM doesn't claim to use TIP3P -- they call it TIP3PF or something.
Ok. I'd say that anything that is defining standard TIP3P can continue to use the existing names, like "tip3p-O". But if the CHARMM version isn't standard TIP3P, it should use different names, like "tip3pf-O" or "charmm-tip3p-O".
Actually, thinking about the residue names more - we need something to mean 'this is the residue to override with that one' in the overriding, so it seems like we should keep the residue names the same and override based on the same residue name?
That makes sense to me. Especially because the overriding template might have a different bond graph than the original one. That's certainly true for multi-site ion models. The standard Zn template will be just a single atom. The overriding one will add in some extra particles and bonds. And it needs to replace the standard one, such that when you call addExtraParticles()
it will add the particles.
AMBER water models come with the hydrogen - hydrogen harmonic bond restraint. Clearly this is replaced by putting on constraints when creating system in OpenMM as we don't use those in the existing models, correct?
The current water models actually include this too. Even though most water models are officially defined to be rigid, it's sometimes useful to represent them as flexible. For example, energy minimization is a lot easier without the constraints. When you call createSystem()
, it automatically replaces those bonds with constraints unless you specify rigidWater=False
.
The current water models actually include this too. Even though most water models are officially defined to be rigid, it's sometimes useful to represent them as flexible. For example, energy minimization is a lot easier without the constraints. When you call createSystem(), it automatically replaces those bonds with constraints unless you specify rigidWater=False.
I'm confused here. Are you saying the current water models have H - H bonds? They don't in the XMLs, doesn't look like they are hardcoded anywhere - topology doesn't have those bonds.
I see how createSystem()
replaces H - O bonds and the angle with constraints for residues named HOH
, but nothing on the H - H bonding.
No, I misunderstood what you were saying. It includes O-H bonds, and an H-O-H angle.
Right, I see.
Just to make sure I understand right - you should not run a simulation with unconstrained waters using a model that is defined as rigid. So the only usefulness of keeping these models flexible would be to first createSystem()
with rigidWater=False
, create simulation, minimize energy - copy the positions over to a new simulation where rigidWater = True
was used when creating system?
Hence the bond and angle parameters are only useful for energy minimization. The same must be the case in AMBER. Why then does AMBER chose to impose this additional H-H bond for flexible use of these models and OpenMM does not? Does that introduce any differences in the minimization and final geometries? Tagging @swails here too.
I wouldn't go so far as to say that's the only use, but it's certainly one of the most common. It's certainly true, though, that most standard water models are defined to be rigid. So if you ask for TIP3P but set rigidWater=False
, you're getting something different from "standard" TIP3P. Because standard TIP3P is rigid. I think the only non-rigid water models we currently include are for AMOEBA.
Ok, thanks!
(Deleted some comment here, need to think about it more).
Ok, so what I was thinking was - what algorithm does OpenMM implement constraints with? And what exactly would be the difference between having a water molecule with atoms in non-equilibrium positions and a) minimizing it as flexible b) minimizing it as rigid? Would b) be calculated using SHAKE / whatever else OpenMM uses for constraints or would it exchange constraints for harmonic bonds during minimization?
I'm wondering this because it seems like AMBER would use those H-H bonds for faster minimization perhaps? @ChayaSt told me CHARMM used these bonds too and says in its files it's 'for SHAKE'. But I don't understand what it changes for SHAKE whether you add another bond - isn't the only input into SHAKE the constrained positions of the 3 particles?
OpenMM uses a mix of three different constraint algorithms: SETTLE for water, SHAKE for clusters of hydrogens bound to a heavy atom, and CCMA for everything else.
And what exactly would be the difference between having a water molecule with atoms in non-equilibrium positions and a) minimizing it as flexible b) minimizing it as rigid?
In case b, all bonds and angles would end up having precisely the values they're constrained to. That's true regardless of what forces are acting on the molecule. That isn't true in case a. For an isolated water molecule with no outside forces acting on it, the values would come out the same (because they also happen to be the minimum energy configuration), but if there's anything else in the system interacting with the molecule, that will alter its geometry.
@ChayaSt told me CHARMM used these bonds too and says in its files it's 'for SHAKE'.
I'd guess what that means is that CHARMM only knows how to replace bonds by constraints, not angles. So if you want the H-H distance to be constrained, you have to include a bond there. OpenMM can add constraints for angles, so that's how it knows to add a third constraint.
One thing I'll point out is look how angle constraints in OpenMM are implemented -- they are converted into distances via standard geometry, right? So how is that different from just using a bond instead that defines the same geometry for water?
For others it might differ because the constraint algo is not exact, but SETTLE is so this should make no difference. Right?
Last comment, given that Amber and charmm are by far the oldest packages here I suspect that the original definition of the rigid waters had three bonds rather than two bonds and one angle. But maybe not :)
Thanks @peastman!
Ok, so going onto the final shape of this - another problem is the virtual sites. I'm currently trying to understand the translation from the EP with bonds to the oxygen that AMBER uses to the OpenMM virtual site - particularly how the weights for the virtual sites are calculated?
When you made the existing water ffxmls - was this a conversion from AMBER? Is there any info on the questions above sitting in an issue/PR somewhere?
I'm not sure if this is a trivial thing to do with the virtual sites? Should I just manually copy the existing ffxml's into the newly converted ion xmls instead?
There's also the question of oxygen and hydrogen masses - these are exact in the existing xmls, but Amber parm10.dat has them less exact. This was the same thing with the protein/nucleic forcefields, where we decided to go with the inexact Amber masses because that's what we're reproducing. With waters, should we change to using the less exact masses too? Is this an AMBER reproduction too or are waters more of a general case? I'm not sure what the original implementation software for say tip3p was.
particularly how the weights for the virtual sites are calculated?
I just looked up the definitions of the water models and calculated the weights by hand. Depending on how AMBER represents them (I don't know), it might or might not be straightforward to convert them. If it isn't, just copying the existing definitions might be the safest thing.
There's also the question of oxygen and hydrogen masses - these are exact in the existing xmls, but Amber parm10.dat has them less exact.
This probably makes very little difference. How much difference is there in the definitions?
I just looked up the definitions of the water models and calculated the weights by hand. Depending on how AMBER represents them (I don't know), it might or might not be straightforward to convert them. If it isn't, just copying the existing definitions might be the safest thing.
Ok!
This probably makes very little difference. How much difference is there in the definitions? Yeah not a big difference at all.
The existing files have:
<Type name="tip3p-O" class="OW" element="O" mass="15.99943"/>
<Type name="tip3p-H" class="HW" element="H" mass="1.007947"/>
AMBER params are:
<Type name="HW" class="HW" element="H" mass="1.008"/>
<Type name="OW" class="OW" element="O" mass="16.0"/>
@swails can you help out on the virtual sites weights calculations (i.e. how do I translate the AMBER models that use EP's with EP-O bonds to the OpenMM virtual sites)? I mentioned this briefly during the Skype we had, I think you said that would be sitting somewhere in ParmEd?
I don't think there are many cases where anyone would care about a 0.01% difference in the mass.
I have decided to not worry about coding up the virtual sites conversion - we will convert solvents by hand and the conversion script accepts ready solvent ffxml's, which it appends to the ion ffxmls during conversion. Closing this.
@peastman @jchodera @swails
tl:dr: All protein/nucleic converted, some issues outstanding that could bring in a few more FFs, some FFs never to be converted - all explained below. See ions & solvents section for decisions on how to distribute those.
Happy to report that all forcefields that are convertable right now, except for ions and solvents, have been converted and validated - all files here (files/ keeps files for tests, ffxml/ is the converted xmls). Perhaps you could have a first quick reviewing look?
In summary the script uses the master.yaml for list of files to convert / provenance info / tests to run, converts all of the files and validates through specified tests:
The relative (to AMBER energy) energy differences allowed are 1e-5 for all forces, except for impropers (dihedrals are split) and 1e-2 for impropers - except for one protein forcefield that needed to go up to 2e-2 on impropers and nucleic acids for which the improper testing is turned off for now, as I'm thinking what to do with the impropers problems there.
I've converted all the FFs in AmberTools 15 that I can convert within existing code and problems, and these are the ones that can't be converted:
lipid11 and lipid14 - lipid11 uses mixed 1-4 LJ scaling, lipid14 is waiting on the head/tail issue (https://github.com/choderalab/openmm/issues/10 and https://github.com/ParmEd/ParmEd/pull/623 - @peastman does OpenMM have a sense of directionality for residues (head/tail) and can we just have one external bond for the acyl chains? lipid14 also has a dihedrals issue https://github.com/choderalab/openmm/issues/11
In fact, I question the utility of lipid14 in OpenMM because of the route I had to take to prepare a PDB - use the CHARMM GUI membrane builder, then use a charmm2amber python conversion script in AMBER that converts the PDB to the lipid14 residue names + the specific acyl chain TER head TER acyl chain structure of the file. Then you run that in LeAP, save off a prmtop, load into OpenMM and save as a PDB. Might as well use the AMBER prmtop and inpcrd. What do you think? Maybe we can include a charmm2OpenMM conversion script in the future if there's interest.
EDIT: * also oldff/leaprc.ff94.nmr - leaprc does not load n- and c-terminal protein residues.
TO DO: clean up the script, a lot of repeating code now
Ions & solvents
One question here - do we want water-ions bundles where possible or completely separate waters and ions?
In AMBER I see:
then we have frcmod's for all solvents separately and all ions separately - please see p. 45, 46 of the AMBER manual.
They have: 1) Joung/Cheatham, 2) Li/Merz et al. HFE set 3) Li/Merz et al. IOD set 4) Li/Merz et al. CM set 5) Li/Merz et al. 12-6-4 set - each for tip3p/tip4pew/spce
So I think we can do either:
A Waters & ions together in ffxmls - perhaps there should be a default one (I don't even know which set params for Na+ and Cl- in the currently distributed ones come from? Would that be Joung-Chetham?).
B Separate
I don't see particularly big differences to be fair, it's looking like either 3x5=15 files for option A and 15+3 for option B. Perhaps B is a bit more dangerous in that people could load in wrong combinations and burdensome in that you have to think about which ions to load in even though you're just doing simple ions for charge neutralization - this is why I think we should also have 'default' sets if option A.
I will do all with the 'new naming scheme' of atomic_ions.lib (i.e. we're leaving ions94.lib and especially ions91.lib behind) + parm10.dat for a few 'general' parameters that sit there.
As for solvents I'll need to work out how to deal with virtual sites etc.