choderalab / protons

OpenMM testbed for constant-pH methodologies.
http://protons.readthedocs.io/
MIT License
21 stars 13 forks source link

Brainstorm simulation API proposal #19

Open bas-rustenburg opened 8 years ago

bas-rustenburg commented 8 years ago

So we can discuss it here:


from __future__ import print_function
from simtk.openmm import app
# TODO  from constph import ConstpHForceField ?
import simtk.openmm as mm
from simtk import unit
from sys import stdout

pdb = app.PDBFile('input.pdb') # preprocessed with right residue names to indicate constph?

forcefield = app.ConstpHForceField('amber99cph.xml', 'tip3p.xml', 'ligandcph.xml') # subclass of forcefield that supports custom format residues

system = forcefield.createSystem(pdb.topology, 
 nonbondedMethod=app.PME,nonbondedCutoff=1.0*unit.nanometers, constraints=app.HBonds, rigidWater=True, ewaldErrorTolerance=0.0005, cph_indices=None, ph=7.4)
# cph_indices, if None, all that can be matched set up as constant ph, else, list of indices
integrator = mm.LangevinIntegrator(300*unit.kelvin, 1.0/unit.picoseconds, 2.0*unit.femtoseconds)
integrator.setConstraintTolerance(0.00001)

# TODO define compound integrator here too, or leave for simulation?
ncmcintegrator =  VelocityVerletIntegrator(ncmc_timestep)
system.addForce(mm.MonteCarloBarostat(1*unit.atmospheres, 300*unit.kelvin, 25))

platform = mm.Platform.getPlatformByName('CUDA')
properties = {'CudaPrecision': 'mixed'}
# compound integrator under the hood, hide system update under the hood
simulation = app.Simulation(pdb.topology, system, {'md': integrator, 'ph': ncmcintegrator}, platform, properties)
simulation.context.setPositions(pdb.positions)

print('Minimizing...')
simulation.minimizeEnergy()
simulation.context.setVelocitiesToTemperature(300*unit.kelvin)

simulation.reporters.append(app.DCDReporter('trajectory.dcd', 1000)) # modify this to only write out active protons?
# report protonation states
simulation.reporters.append(app.StateDataReporter(stdout, 1000, step=True,
    potentialEnergy=True, temperature=True, progress=True, remainingTime=True,
    speed=True, totalSteps=1000, protonationStates=True, separator='\t'))

print('Running Production...')
simulation.calibrate(10000, ph_every=(100,1)) # 10000 cph calibration steps, updating ref energies
simulation.step(1000000, ph_every=(6000, 1)) # run mc step every 6000 md steps, 1 attempt
print('Done!')
jchodera commented 8 years ago

Pinging @peastman, who may have some useful suggestions.

Some initial thoughts:

pH = 7.4 # pH to run simulations at
simulation = app.Simulation(pdb.topology, system, platform, properties, pH=pH)

I'm not sure how I feel about adding the ph_every optional argument to simulation.step(). How are other update intervals specified? Most of the other things like barostats are applied via Force objects, which is why I thought this would make the most sense to implement at the C++ level as a Force object---to make the interface fully consistent.

jchodera commented 8 years ago

Tagging @peastman again. Would love to get your input here now, rather than when it's too late to change the design.

peastman commented 8 years ago

Why would you want to subclass ForceField? I'm not sure what that would be intended to solve.

For the integration and attempting moves, what about an API similar to the simulated tempering script: https://simtk.org/svn/pyopenmm/trunk/simtk/pyopenmm/extras/simulatedtempering.py. It wraps a Simulation object, then provides its own step() method. In this case, you could have a class that takes the same arguments as the Simulation constructor (system, integrator, etc.). It would then automatically generate the NCMC integrator and combine it with the user provided one in a CustomCompoundIntegrator. And when you called step() on this object, it would automatically switch back and forth between the two integrators to perform the MC moves as needed.

jchodera commented 8 years ago

Why would you want to subclass ForceField? I'm not sure what that would be intended to solve.

The other option is to make changes to ForceField directly to support the specification of titratible residues in the ffxml files.

For the integration and attempting moves, what about an API similar to the simulated tempering script: https://simtk.org/svn/pyopenmm/trunk/simtk/pyopenmm/extras/simulatedtempering.py. It wraps a Simulation object, then provides its own step() method. In this case, you could have a class that takes the same arguments as the Simulation constructor (system, integrator, etc.). It would then automatically generate the NCMC integrator and combine it with the user provided one in a CustomCompoundIntegrator. And when you called step() on this object, it would automatically switch back and forth between the two integrators to perform the MC moves as needed.

We could do this, but is this the best approach? We could do all of that within Simulation itself just as easily. If the goal is to make it very easy to run constant-pH, adding this into Simulation would make this most accessible. How many people run that simulated tempering script? Is it even tested in travis? It seems dusty...

jchodera commented 8 years ago

Oh I see---it's not even in OpenMM proper.

peastman commented 8 years ago

Right. I'm questioning whether constant pH belongs in a core class like Simulation, or whether it should be more self contained. It's still a very new technique, and in fact there are lots of different techniques for doing constant pH, and the techniques we use five years from now may be very different from the ones we use today. So instead of assuming one particular method, and deeply integrating it into Simulation, I'd rather keep it self contained. In the same way that different Integrators are self contained. That way if we later implement a different method that requires a very different API, we won't be stuck with an obsolete interface.

jchodera commented 8 years ago

Right. I'm questioning whether constant pH belongs in a core class like Simulation, or whether it should be more self contained. It's still a very new technique, and in fact there are lots of different techniques for doing constant pH, and the techniques we use five years from now may be very different from the ones we use today. So instead of assuming one particular method, and deeply integrating it into Simulation, I'd rather keep it self contained. In the same way that different Integrators are self contained. That way if we later implement a different method that requires a very different API, we won't be stuck with an obsolete interface.

That would argue for making it a Force then, like we do with integrators or MonteCarloBarostat?

jchodera commented 8 years ago

Both Vijay and I are very enthusiastic to make it as easy as possible for people to use this technique generally. I definitely agree that we want some flexibility in the choice of options, but there could be multiple ways to accommodate that.

peastman commented 8 years ago

That would argue for making it a Force then, like we do with integrators or MonteCarloBarostat?

That would also be an option, but it might require some really ugly hacks. It would mean one force would implement updateContextState() to modify parameters of a different force. But we can certainly explore that possibility too.

jchodera commented 8 years ago

That would also be an option, but it might require some really ugly hacks. It would mean one force would implement updateContextState() to modify parameters of a different force. But we can certainly explore that possibility too.

There could be other possible designs that don't relegate constant pH to a third-class scheme. What about adding a plugin architecture to Simulation that would support constant-pH capabilities? That way, the constant-pH stuff could be separately configured, later upgraded, etc.

peastman commented 8 years ago

What about adding a plugin architecture to Simulation

Could you elaborate? What sort of design did you have in mind?

jchodera commented 8 years ago

I don't have a concrete design in mind, but from the user end, this could be something like

pdb = app.PDBFile('input.pdb') # preprocessed with right residue names to indicate constph?
forcefield = app.ForceField('amber99cph.xml', 'tip3p.xml', 'ligandcph.xml') 
system = forcefield.createSystem(pdb.topology, nonbondedMethod=app.PME,nonbondedCutoff=1.0*unit.nanometers, constraints=app.HBonds, rigidWater=True, ewaldErrorTolerance=0.0005)
integrator = mm.LangevinIntegrator(300*unit.kelvin, 1.0/unit.picoseconds, 2.0*unit.femtoseconds)
constph = app.ConstantPHPlugin(pH=7.4, interval=50)
simulation = app.Simulation(pdb.topology, system, platform, properties, plugins=[constph])
simulation.context.setPositions(pdb.positions)
peastman commented 8 years ago

For that to work, I see two points at which the plugin would have to be called. First, it would get the opportunity to preprocess the arguments passed to Simulation, so it could replace the integrator. And second, it would have the ability to interrupt integration periodically and run its own code. The latter function would be very similar to a reporter, and in many cases would literally be a reporter.

This also sounds like a possibility worth pursuing.

cxhernandez commented 8 years ago

This might be a naive question but, if this were to go on FAH, how would we handle the constantly changing topologies? Would we have to input a topology with virtual sites at every possible protonation site so that we can recover our trajectories at the end?

bas-rustenburg commented 8 years ago

@cxhernandez in the current setup, protons aren't removed from the topology, and the initial topology should have all possible protons included. During simulation, they're switched on and off by changing to and from dummy atoms. We change their non-bonded parameters, although bonded interactions are maintained so they don't drift away.

cxhernandez commented 8 years ago

:+1:

thanks for the clarification @bas-rustenburg!

peastman commented 8 years ago

An important question is whether this should be implemented in Python or C++. If it's in Python, we wouldn't be able to use it in FAH, so if we wanted constant pH there, we would have to implement it separately as part of the core (but that could be a simplified version with fewer options than the more general OpenMM version). If it's in C++, it's impossible to handle parameter updating in any sort of general way. It would need hardcoded support for every Force subclass it could work with.

If we decide to go with C++, what about implementing it as an Integrator rather than a Force? This would be less brittle and require fewer hacks. It could be similar to a CompoundIntegrator, in that you would create two other Integrators, one for it to use in ordinary integration, and one for it to use when attempting MC steps. Then you would just call step() on it as usual, and periodically it would throw in an MC step.

jchodera commented 8 years ago

@bas-rustenburg is working on adding code to ForceField and extending the ffxml format for these titratible forms of residues, as well as setting up a workflow for both titratible residues and small molecules. They would be fully protonated, as he suggests, and the parameters would change dynamically. The Topology would remain constant.

@peastman: Is there a way to circumvent the ugly hacks through a clever choice of API? There wouldn't be an easy way to get around the need to update other forces, unfortunately. In some respects, this is similar to how MonteCarloBarostat can make changes that modify the potential, but I realize it is also quite different if we are modifying force parameters directly.

I wonder if we could cook up a simpler scheme based on modifying context parameters. We may also have to come up with extensions to CustomNonbondedForce that allow us to swap in different parameters for residues based on context parameters.

peastman commented 8 years ago

I don't think it's practical to do this with global parameters. First, you'd be restricted to only Forces that use global parameters, so no NonbondedForce. That's ok for implicit solvent, since you could do that with only custom forces, but PME would be out.

But even assuming you wanted to use only CustomNonbondedForce and CustomGBForce, what would you do? There would be possibly dozens of parameters, each controlling the state of one site. And then you would have to write a huge expression for the energy, involving all of those dozens of global parameters. It would be insanely inefficient.

Is there a way to circumvent the ugly hacks through a clever choice of API?

What about my suggestion of doing it through an integrator? That would be a lot less hacky, since it could work entirely by calling public methods on the Forces at times when they expected those methods to be called.

jchodera commented 8 years ago

Apologies for not responding to this one earlier---I hadn't seen it when I posted my last comment.

An important question is whether this should be implemented in Python or C++. If it's in Python, we wouldn't be able to use it in FAH, so if we wanted constant pH there, we would have to implement it separately as part of the core (but that could be a simplified version with fewer options than the more general OpenMM version).

I think we need to find a way to get the fundamental capabilities into the C++ layer in order for this to be easily supported on FAH.

If it's in C++, it's impossible to handle parameter updating in any sort of general way. It would need hardcoded support for every Force subclass it could work with.

I'm still trying to think of whether there is some way to make this general while not requiring extensions to all Force APIs.

If we decide to go with C++, what about implementing it as an Integrator rather than a Force? This would be less brittle and require fewer hacks. It could be similar to a CompoundIntegrator, in that you would create two other Integrators, one for it to use in ordinary integration, and one for it to use when attempting MC steps. Then you would just call step() on it as usual, and periodically it would throw in an MC step.

This is a really interesting idea. Were you thinking that this integrator would be specific to constant pH, or that it would be somehow more general/programmable? Could it be combined with CompoundIntegrator? Why not just use CompoundIntegrator to combine this integrator (which could just do the constant-pH steps) with some other integrator?

The algorithm would roughly go something like this:

But even assuming you wanted to use only CustomNonbondedForce and CustomGBForce, what would you do? There would be possibly dozens of parameters, each controlling the state of one site. And then you would have to write a huge expression for the energy, involving all of those dozens of global parameters. It would be insanely inefficient.

There are currently more efficient ways to do this that involve duplicating the titratable residues and using a small set of global parameters to select the titration state of each one. In the Custom*Force terms, there is simply a check to see if the current titration state is active or not, and the interactions are switched on and off according to a continuous parameter. It's not incredibly elegant, but it doesn't require energy expressions with dozens of parameters. But I wonder if there would be a way to set (or interpolate) parameters from a table using a global variable instead.

jchodera commented 8 years ago

Out of curiosity, @peastman, why would this be much cleaner as an Integrator than a Force? MonteCarloBarostat also modifies positions and System parameters that govern energy computation (in this case box vectors) and uses energy calculations to do its thing. This is essentially the same set of operations.

peastman commented 8 years ago

I'm still trying to think of whether there is some way to make this general while not requiring extensions to all Force APIs.

It doesn't require changes to the Forces, but it does require the integrator to have hardcoded support for every type of force it can modify. It needs to have one block of code for updating a NonbondedForce, a different block for a GBSAOBCForce, and so on.

Were you thinking that this integrator would be specific to constant pH

It would be specific to constant pH. All of the logic you outlined above would be built into it.

MonteCarloBarostat also modifies positions and System parameters that govern energy computation (in this case box vectors)

Modifying box vectors is one of the things Forces are explicitly allowed to do inside updateContextState(). And the code to support that is entirely contained within Context. Modifying other Force objects is not something they're allowed to do, and there are lots of subtle ways in which Force classes might implicitly assume they wouldn't get modified then. That's not to say it couldn't be made it work, but we'd have to go through every force one by one (in fact, every platform's implementation of every force) and verify that it wasn't doing anything this would break.

Even worse, we're not just modifying Forces inside updateContextState(). We're running integration, which will of course involve its own calls to updateContextState(). So that will be getting called recursively, which could be really painful to deal with.

Implementing it as an Integrator doesn't have any of these problems. It just switches between different Integrators in exactly the way that CompoundIntegrator already does.

peastman commented 8 years ago

In the Custom*Force terms, there is simply a check to see if the current titration state is active or not, and the interactions are switched on and off according to a continuous parameter.

I don't see how that can be simpler than what I described. You're going to need one such parameter for every site, right?

jchodera commented 8 years ago

I don't see how that can be simpler than what I described. You're going to need one such parameter for every site, right?

Yes, but you do only need two global parameters per energy expression. The energy expressions don't need to involve all the global parameters as you suggested ("And then you would have to write a huge expression for the energy, involving all of those dozens of global parameters.").

I think the integrator approach is better for trying to make this a first-class feature, though. I'm thinking about what the API should look like.

jchodera commented 8 years ago

In thinking about an API, how about this?

The user would interact with a Python class MonteCarloTitration that would handle the configuration of the constant-pH integrator:

modeller = Modeller(pdb.topology)
forcefield = ForceField('amber99sbildn.xml', 'amber99sbildn-constph.xml', 'tip3p.xml')
modeller.addHydrogens(forcefield=forcefield, pH=7) # needs to be aware that fully-protonated titratable forms of residues are used
system = forcefield.createSystem(topology, nonbondedMethod=PME)
my_integrator = LangevinIntegrator(temperature, collision_rate, timestep)
ph_integrator = MonteCarloTitration(topology, system, forcefield, pH=7, switching_steps=50)
# Calibrate constant-pH reference energies (takes a while, creates/destroys Contexts)
ph_integrator.calibrate(tol=0.01)
# Create a CompoundIntegrator where we run 50 steps of Langevin followed by a constant-pH titration update
integrator = CompoundIntegrator()
integrator.addIntegrator(my_integrator)
integrator.addIntegrator(ph_integrator)
integrator.setSchedule([50, 1]) # new method to allow automated integrator step sequence to be selected
# Create context.
context = Context(system, integrator)
# Run simulation
integrator.step(5000)

By adding a method such as CompoundIntegrator.setSchedule(), we could allow the user to run a simulation that automatically inserts a constant-pH titration step at the desired interval.

Before diving into the details of what a MonteCarloTitrationIntegrator API would look like, a few basic questions about how things would work if this is an Integrator rather than a Force of drive:

(Edit: Added topology to MonteCarloTitration initializer.)

peastman commented 8 years ago

I think that design tries to put too much of the logic into Python. That won't work for FAH. And there's a lot more to constant pH than just switching back and forth between two integrators on a regular schedule. I'm thinking more along these lines:

integrator1 = LangevinIntegrator(300.0, 1.0, 0.002) # the main integrator
integrator2 = VerletIntegrator(0.002) # used during MC steps
integrator = ConstantPHIntegrator(integrator1, integrator2, 100, 50) # attempt MC move every 100 steps, and each one involves 50 steps with integrator2

When you call step() on integrator, you are strictly specifying a number of steps it will take with integrator1. There might happen to be some MC steps in the middle. If there are, they'll involve taking steps with integrator2, which might then be accepted or rejected. But as far as the main simulation is concerned, those are instantaneous changes that take place in between time steps.

ConstantPHIntegrator will of course need a lot more configuration, to tell it about all the different groups and the parameters to use for them. We can have Python code to set that up. Once it's done, the integrator can be serialized to XML and sent off to FAH like any other integrator.

jchodera commented 8 years ago

I think that design tries to put too much of the logic into Python. That won't work for FAH.

It will work just fine for FAH. I was just describing what this would look like from the user perspective who would be setting things up in Python. We haven't reached the point of discussing the C++ level integrator yet. The C++ level integrator would look almost exactly like the proposal here. If the user makes use of the high-level Python code to set things up, it hides much of the complexity and automates much of what is needed to set up the C++ level integrator with the appropriate titration states defined in the ffxml files. An FAH user would then serialize the System and CompoundIntegrator to XML, which would capture all of the detailed information needed to run in C++.

jchodera commented 8 years ago

integrator = ConstantPHIntegrator(integrator1, integrator2, 100, 50) # attempt MC move every 100 steps, and each one involves 50 steps with integrator2

Why would we do that if the alternation between integrators is already the job of CompoundIntegrator? We should simply extend the API of CompoundIntegrator to handle this alternation automatically if that is what is desired by the user.

jchodera commented 8 years ago

I'm presuming here that serializing CompoundIntegrator will currently serialize any of its component integrators out to XML alongside itself. If that's not true, we'll need to revisit this idea.

peastman commented 8 years ago

Why would we do that if the alternation between integrators is already the job of CompoundIntegrator?

Like I said, there's way more to this than just alternating between integrators. Its job includes:

  1. Take steps with the main integrator.
  2. Periodically decide it's time to try an MC move.
  3. Record the current energy.
  4. Modify parameters on various force objects.
  5. Do we need to do NCMC? If so, take a series of steps with the secondary integrator, gradually continuing to change the parameters of force objects, and presumably keeping most atoms fixed while doing it.
  6. Evaluate the energy again.
  7. Decide whether to accept the MC step.
  8. If it isn't accepted, reset all forces back to how they were before, and also reset all state information.
  9. If it is accepted, make sure the time and step count keep their original values, so this appears as an instantaneous change in between two time steps.

That's why we need a special Integrator class, not just a trivial CompoundIntegrator.

jchodera commented 8 years ago

Yes, it has to do all of those things. It should do all of those things inside a MonteCarloTitrationIntegrator. It may need to run NCMC using velocity Verlet internally. But the user will still have to alternate between the MonteCarloTitrationIntegrator and another integrator that does ergodic sampling (such as LangevinIntegrator) outside of MonteCarloTitrationIntegrator, so why not allow the user to compose these however they want (and even mix in a third or fourth integrator as desired) by letting them compose MonteCarloTitrationIntegrator with other integrators via CompoundIntegrator?

I am specifically saying we should not use your proposed API for

integrator = ConstantPHIntegrator(integrator1, integrator2, 100, 50)

because

peastman commented 8 years ago

We will only support velocity Verlet for NCMC (because other integrators lead to unworkably complex acceptance probabilities)

We still need to let them supply their own integrator. What if the system involves Drude particles? That needs a special integrator. What if they've constructed a CustomIntegrator that does things in a nonstandard way (such as freezing certain degrees of freedom)? They need that. What if they want to use an MTSIntegrator to do it more efficiently? We should allow that. There are many ways of doing constant energy integration, and good reasons people might use a different class than VerletIntegrator to do it.

And a very important point I've said a few times: MC steps should appear as instantaneous changes between time steps. Performing an MC step should not consume a call to step() on the top level integrator. The scheme you described would handle that incorrectly.

so why not allow the user to compose these however they want

Nothing stopping them from doing it. This integrator can easily be nested inside a CompoundIntegrator, or the "main integrator" passed as the first argument could be a CompoundIntegrator.

jchodera commented 8 years ago

Here's a proposed API for the C++ layer:

# Create the integrator
integrator = simtk.openmm.MonteCarloTitrationIntegrator(temperature)
# Set options
integrator.setSwitchingSteps(100) # use NCMC with 100 switching steps (default: 0)
integrator.updateNonbondedForce(True) # integrator will update NonbondedForce during trials
integrator.updateCustomGBForce(True) # integrator will update CustomGBForce during trials
# Add definition of first titratable group of atoms 
titratration_group_index = integrator.addTitratableGroup(atom_indices)
titration_state_index = integrator.addTitrationState(titration_group_index, log_solvent_population1)
integrator.setNonbondedForceParameters(titration_group_index, titration_state_index, particle_index, charge, sigma, epsilon)
integrator.setCustomGBForceParameters(titration_group_index, titration_state_index, params)
...
jchodera commented 8 years ago

We still need to let them supply their own integrator.

We can't support other classes than velocity Verlet for NCMC because we don't have a generalizable way to compute the nonequilibrium work built into the other integrators. Unless we build in an automated way to compute the differential path action for any integrator allowed in OpenMM, we can't let people specify their own integrators.

What if the system involves Drude particles? That needs a special integrator. What if they've constructed a CustomIntegrator that does things in a nonstandard way (such as freezing certain degrees of freedom)? They need that. What if they want to use an MTSIntegrator to do it more efficiently? We should allow that. There are many ways of doing constant energy integration, and good reasons people might use a different class than VerletIntegrator to do it.

They can't. These will not be supported. VerletIntegrator is not supported either. It's only velocity Verlet, and only a limited form with tight constraints that will be be run "under the hood" that the user will not be able to tinker with in any way.

This is not just an issue of doing constant energy integration. This is an issue of computing the appropriate nonequilibrium work for arbitrary integrators, which is hard. We generalized this as much as possible in this paper, which you're welcome to read if you want to understand what we're up against here. It's possible there's a simple, more general route than we've realized to compute differential path actions for arbitrary integrators---which would be totally awesome---but it's generally difficult to do this for all but the simplest of integrators. We could work this out and build it into some of the popular integrators in OpenMM---and this would be useful for all sorts of things---but this would be a major undertaking.

And a very important point I've said a few times: MC steps should appear as instantaneous changes between time steps. Performing an MC step should not consume a call to step() on the top level integrator. The scheme you described would handle that incorrectly.

Why? Can't MonteCarloTitrationIntegrator define integrator.timestep = 0 so that the clock isn't advanced when MonteCarloTitrationIntegrator.step() is called?

Nothing stopping them from doing it. This integrator can easily be nested inside a CompoundIntegrator, or the "main integrator" passed as the first argument could be a CompoundIntegrator.

It is weird to think of nesting nested integrators. That could lead to all sorts of non-obvious problems.

peastman commented 8 years ago

Can't MonteCarloTitrationIntegrator define integrator.timestep = 0 so that the clock isn't advanced when MonteCarloTitrationIntegrator.step() is called?

It would still consume a call to step(). The step count would still be incremented. The elapsed time would not be equal to steps*stepSize.

It is weird to think of nesting nested integrators. That could lead to all sorts of non-obvious problems.

Like what? I don't see any problems with it. It's not even as if the user was having to switch the constant pH integrator between different child integrators. We would simply take the "main integrator" and wrap it in a layer that adds constant pH functionality to it. Then they could use that integrator directly, or stick it inside a CompoundIntegrator exactly like any other integrator.

jchodera commented 8 years ago

Apologies for the delayed reply---spent all day yesterday on a plane.

It would still consume a call to step(). The step count would still be incremented. The elapsed time would not be equal to steps*stepSize.

Could we extend the Integrator API to allow integrators to not increment the step counter?

Like what? I don't see any problems with it. It's not even as if the user was having to switch the constant pH integrator between different child integrators. We would simply take the "main integrator" and wrap it in a layer that adds constant pH functionality to it. Then they could use that integrator directly, or stick it inside a CompoundIntegrator exactly like any other integrator.

I'm still trying to think of odd corner cases here, but the biggest reason to not do this is that---at least in the current design---it just isn't necessary to add that ind of complexity. However:

I've been thinking about your desire to support more diversity in integrators, and wonder if we could do this by requiring that the NCMC integrator be a CustomIntegrator that provides a work global variable. We would then provide a velocity Verlet integrator that accumulates the nonequilibrium work in this variable, but still allow future experimentation by users, provided they implement the appropriate work computation inside the integrator. This certainly isn't something I would expect most people to do, but it is flexibility we could provide for future research, and we could mask the complexity this introduces by providing that simple-to-use Python approach to creating and configuring the MonteCarloTitrationIntegrator I was discussing earlier.

I'll modify my API proposals to account for this:

At the Python app level, the user would interact this way:

modeller = Modeller(pdb.topology)
forcefield = ForceField('amber99sbildn.xml', 'amber99sbildn-constph.xml', 'tip3p.xml')
modeller.addHydrogens(forcefield=forcefield, pH=7) # needs to be aware that fully-protonated titratable forms of residues are used
system = forcefield.createSystem(topology, nonbondedMethod=PME)
my_integrator = LangevinIntegrator(temperature, collision_rate, timestep)
ph_integrator = MonteCarloTitration(topology, system, forcefield, pH=7, switching_steps=50)
# Calibrate constant-pH reference energies (takes a while, creates/destroys Contexts)
ph_integrator.calibrate(tol=0.01)
# Create a CompoundIntegrator where we run 50 steps of Langevin followed by a constant-pH titration update
integrator = CompoundIntegrator()
integrator.addIntegrator(my_integrator)
integrator.addIntegrator(ph_integrator)
integrator.setSchedule([50, 1]) # new method to allow automated integrator step sequence to be selected
# Create context.
context = Context(system, integrator)
# Run simulation
integrator.step(5000)

Under the hood, this would configure the C++ integrator using this API (written here using the Python wrappers for the API just for convenience):

# Create the integrator (which by default uses instantaneous Monte Carlo)
integrator = simtk.openmm.MonteCarloTitrationIntegrator(temperature)
# Tell integrator to use NCMC.
# Setting nsteps = 0 switches back to using instantaneous MC
# The provided integrator must be a `CustomIntegrator` that provides a `work` global variable
integrator.setSwitching(nsteps, ncmc_integrator)
# Configure which Force terms should be updated
integrator.updateNonbondedForce(True) # integrator will update NonbondedForce during trials
integrator.updateCustomGBForce(True) # integrator will update CustomGBForce during trials
# Add definition of first titratable group of atoms 
titratration_group_index = integrator.addTitratableGroup(atom_indices)
titration_state_index = integrator.addTitrationState(titration_group_index, log_solvent_population1)
integrator.setNonbondedForceParameters(titration_group_index, titration_state_index, particle_index, charge, sigma, epsilon)
integrator.setCustomGBForceParameters(titration_group_index, titration_state_index, params)
...
# There are also equivalent `get` and `set` methods for all of these options.
# lots and lots of additional parameters and titration states are defined in real cases
# Deal with MC statistics
[nattempted, naccepted] = integrator.getStatistics() # get list of number of attempted and accepted titration steps for each site
integrator.resetStatistics() # reset statistics
# Other getters and setters are needed to update temperature in the integrator and in the simulation
peastman commented 8 years ago

Sorry for not responding sooner! Anyway, I still don't understand your objection to having the constant pH integrator be a top level one rather than something that needs to go into a CompoundIntegrator. All of this code:

ph_integrator = MonteCarloTitration(topology, system, forcefield, pH=7, switching_steps=50)
integrator = CompoundIntegrator()
integrator.addIntegrator(my_integrator)
integrator.addIntegrator(ph_integrator)
integrator.setSchedule([50, 1]) # new method to allow automated integrator step sequence to be selected

can be simplified to just this:

integrator = MonteCarloTitrationIntegrator(topology, system, forcefield, pH=7, switching_steps=50, integrator=my_integrator)

For the C++ API, I was thinking of something more generic that will be easier to extend in the future. Something like this:

integrator.setStateParameter(force, titration_group_index, titration_state_index, particle_index, "charge", charge)

The implementation would need hardcoded support for NonbondedForce, CustomNonbondedForce, etc., but that doesn't need to be in the API. Also, it's important to explicitly tell it which Force object to modify, since there might be multiple CustomNonbondedForces.

jchodera commented 8 years ago

Anyway, I still don't understand your objection to having the constant pH integrator be a top level one rather than something that needs to go into a CompoundIntegrator.

I'm just worried this will be limiting by requiring that I wrap another integrator in case I want to do something more complex. For example, if I am trying to do some other form of NCMC as well, I might get into trouble if the constant-pH moves can't be temporarily disabled. What about allowing a None integrator if you don't want to wrap another integrator, so we can have the best of both worlds?

For the C++ API, I was thinking of something more generic that will be easier to extend in the future. Something like this:

integrator.setStateParameter(force, titration_group_index, titration_state_index, particle_index, "charge", charge)

I like it! It can just raise an Exception if something isn't implemented yet.

peastman commented 8 years ago

What about allowing a None integrator if you don't want to wrap another integrator, so we can have the best of both worlds?

That would be for the case where you aren't doing NCMC, so the moves should just be attempted as a direct jump in parameters? Yes, that's about what I was figuring on.