Atom type name issues for GB

jchodera commented 10 years ago

In parameterizing a molecule for simulation with GB, I am running into this problem:

Traceback (most recent call last):
  File "tests/test_alchemy.py", line 578, in <module>
    test_systembuilder_lysozyme_pdb_mol2()
  File "tests/test_alchemy.py", line 533, in test_systembuilder_lysozyme_pdb_mol2
    print "%s has %d particles" % (name, thing.system.getNumParticles())
  File "/Users/choderaj/code/yank/install/lib/python2.7/site-packages/yank-0.9.0-py2.7.egg/yank/systembuilder.py", line 176, in system
    self._create_system()
  File "/Users/choderaj/code/yank/install/lib/python2.7/site-packages/yank-0.9.0-py2.7.egg/yank/systembuilder.py", line 164, in _create_system
    self._system = forcefield.createSystem(self._topology, **self.system_creation_parameters)
  File "/Users/choderaj/code/yank/install/lib/python2.7/site-packages/simtk/openmm/app/forcefield.py", line 507, in createSystem
    force.createForce(sys, data, nonbondedMethod, nonbondedCutoff, args)
  File "/Users/choderaj/code/yank/install/lib/python2.7/site-packages/simtk/openmm/app/forcefield.py", line 1245, in createForce
    raise ValueError('No GBSAOBC parameters defined for atom type '+t)
ValueError: No GBSAOBC parameters defined for atom type MOL-C1

jchodera commented 10 years ago

This is using the new YANK systembuilder framework, which also currently loads a protein forcefield and GB model.

jchodera commented 10 years ago

Not sure quite how we can handle this. We effectively need to make up GB parameters for gaff atom types by analogy with the various protein forcefields. I think LEaP has GB parameters hardcoded into it. @swails may know more about how this operates.

swails commented 10 years ago

GB parameters are now determined by atomic number and connectivity. Have a look at https://github.com/swails/ParmEd/blob/master/ParmedTools/changeradii.py#L108-L138 which highlights the logic for setting the mbondi2 parameters (suitable for OBC1 and OBC2 GB models). The mbondi3 set, suitable for the GBn2 model, is also in that file.

Hopefully the logic there is easy enough to follow using just the variable names without knowing the API inside out. The strange treatment for the C1, C2, and C3 atom types is hacked-in support for UA FFs in tleap (which is why those radii are bigger).

jchodera commented 10 years ago

We'll have to think about how those parameters can even be expressed. Is there an input file that Amber distributes that describes these parameters, or do they only exist as hardcoded parameters inside of code?

swails commented 10 years ago

They're hard-coded in tleap.

swails commented 10 years ago

See https://github.com/choderalab/ambermini/blob/master/leap/src/leap/unitio.c#L6050-L6204

jchodera commented 10 years ago

Thanks, @swails!

OK, we may have a few options here:

Have gaff2xml generate the parameters in its ffxml file if requested, basing this implementation off @swails ParmEd
Write a converter to generate standard gaff_obc.xml files that could be distributed with OpenMM, except that the ffxml format uses different atom types for each residue because charges are tied to specific atom types, meaning that we would never be possible to create a general forcefield file this way.
Write an AmberGBSAGenerator plugin for ForceField that encodes this logic and assigns parameters, eliminating the need for all of the *_obc.xml files.

I think it may be simplest to consider the AmberGBSAGenerator route, since this would be a simple fix that would go into OpenMM and drastically simplify the assignment of Amber GB parameters in general.

kyleabeauchamp commented 9 years ago

For the near term, I believe we have decided that users must go through AmberTools for setting up GB simulations.

jchodera commented 9 years ago

Correct. Is it all right if I create an issue to remind me that we need to add an AmberGBSAGenerator in the future?

kyleabeauchamp commented 9 years ago

We already have an issue on the OpenMM tracker, so why not leave it at that.

I think the issue tracker should be reserved for things that we have plans to fix on some finite timescale, as otherwise it becomes impossible to triage tasks effectively.

jchodera commented 9 years ago

I believe labels and target releases can be used for this purpose.

jchodera commented 9 years ago

Also, we already have code for this---we just have to move it from our gbff project. So I don't think it's unreasonable to say this can be done on a finite timescale.

kyleabeauchamp commented 9 years ago

It's not clear to me that the code even belongs in this repo.

jchodera commented 9 years ago

Eventually, there are a number of things (like this) that belong in the OpenMM repo, but we can't put unstable code in that repo since it is a stable codebase that many people use. Putting it here is logical, since without it, you can't use small molecules with GB.

kyleabeauchamp commented 9 years ago

We can use the AmberTools pipeline, though. I personally think we should deprecate the ffXML pipeline for small molecules because we do not have enough software engineers to maintain more our own forcefield codebase.

Again, you are free to do what you like here--I'm just giving my advice on how we can get science done given the limited number of developer-hours that we have.

jchodera commented 9 years ago

This sounds like a larger discussion best had in person.

choderalab / openmoltools

Atom type name issues for GB #40