Allow running calculations with multiple restraints

choderalab / yank

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.

http://getyank.org

MIT License

175 stars 70 forks source link

Allow running calculations with multiple restraints #1015

Open andrrizzi opened 6 years ago

andrrizzi commented 6 years ago

Since we don't have an analytical correction for the RMSD restraint, we may need to introduce the ability to run with multiple restraints (e.g. RMSD + harmonic) so that we can turn off the rigidifying restraint at the end of the protocol while still reducing the space accessible by the ligand in the decoupled state and being able to bring the ligand to standard concentration.

The only idea I had so far that would make the design very extensible is to have a CompoundRestraint class that can be passed as a single restraint.

Controlling multiple restraint at the same time is essentially the same problem of controlling multiple alchemical regions at the same time, so we should consider these two together. For now, the solution that I and @adw62 adopted for alchemical regions is to have each global variable controlled by a different AlchemicalState, which is assigned to a particular region so something similar may work also for RestraintState.

jchodera commented 6 years ago

I think we want to allow multiple restraints for the same alchemical region, rather than requiring these to be in separate regions. Could we simply allow each restraint for each region to have a separate label or name?

We could tackle this at the same time as reworking the restraints to use and store CustomCVForces.

andrrizzi commented 6 years ago

I agree. I meant that the implementative solutions would be similar, not that we should assign one restraint for each region. In both cases, we'll have separate variables controlling different parts of the Alchemica/RestraintState. Sorry if it was unclear.

Lnaden commented 6 years ago

Here are the required code changes I see for the multi-restraints:

Change the restraints.py classes to support different names for the restraint variable, defaulting to lambda_restraints
Change AlchemicalPhase.create(restraint=...) keyword to support None, a ReceptorLigandRestraint or a list of ReceptorLigandRestraint (we can expand this to a LigandRestraint-ish thing later)
Apply all the restraints to the alchemical phase

I want to propose the following YAML layout for the multi-restraints class:

Create a new YAML block called restraint which supports arbitrary names (like systems have names). Keys are just like the current experiment:restraint:.... block is now. Add new option lambda_name: X which takes a string to be cast to lambda_X, allows each restraint to be tied to either unique or existing values (including sterics and electrostatics)
Change experiment block to now have a restraints: R option where R is either a single or list of restraint(s) defined in the restraints block.
- This would break backwards compatibility and all of our examples, but I don't know of a clean way to continue supporting the old and the new way without having both side-by-side and being confusing.
No changes needed to the protocols block, but people can now just define lambda_X as many times as required where X are the names in the lambda_name keys for the used restraints
- Add a check which cycles through all the lambda_X in the protocols and throw a warning if there is a defined lambda_X which does not tie to any force which references it. But OpenMM may actually error if there is no valid target for setParameter.
- I don't think we should check if there is a lambda_X in the protocol for every lambda_name since there are use cases where we want the restraint to be on the whole time.

Known limitations with this design:

Restraint will still only apply to the complex phases, no way to apply to solvent
Not backwards compatible
Breaks the auto-protocol. I have no idea how to fix this unless we also define the order on which the restraints are suppose to apply.
- Not sure if we would apply the order on the YAML or the restraints.py module per restraint.
- We may have to sit down and enumerate all the useful combinations of restraints to give users options.

Thoughts on this, @jchodera, @andrrizzi?

jchodera commented 6 years ago

This sounds great.

Regarding breaking backward-compatibility: I'm fine with bumping the major version number to 1 at this point since this is the home stretch. We can then start incrementing the major version number every time we break the API, though it might be good to get a few API breaking changes in all at once.

This might also be a good time to rework the restraints to all use CustomCVForce, which would allows us to generalize the restraint classes to all (1) define or reuse CVs (some of which may be shared), (2) store them all each iteration, and (3) compute the restraint via an algebraic energy expression for CustomCVForce.

To fix "auto" protocols, we could include a protocol block that defines waypoints along which auto is supposed to interpolate. That way, users can define a very crude path and have auto come up with the spacing for them.

Have we thought about how multiple alchemical regions might play nicely with this? Maybe there's a chance to support multiple regions in a Topography object for now (if that feature doesn't already exist).

andrrizzi commented 6 years ago

Sounds great, and I'm ok having a list of restraints instead of using the CompoundRestraint solution. A couple of suggestions:

Add new option lambda_name: X which takes a string to be cast to lambda_X, allows each restraint to be tied to either unique or existing values (including sterics and electrostatics)

In the implementation of the multiple AlchemicalRegions, we assigned a name to the region object and the controlling variables assigned to the factory became, for example, lambda_sterics_name, lambda_electrostatics_name. We then have multiple AlchemicalState assigned to control one region each. The protocol section is already flexible enough to allow any possible protocol with multiple lambda names, but if we want shortcuts, the restraint could be tied to other lambda values (e.g., sterics and electrostatics) through a mechanism similar to that implemented by AlchemicalFunctions.

Is there a way we could expose these features within this framework instead of the lambda_name keyword? I think we would benefit from making this consistent (if not YAML, at least at the API level) with the alchemical regions implementation.

I have no idea how to fix this unless we also define the order on which the restraints are suppose to apply.

I'm probably missing something, but if we pass a list of restraints, doesn't the list automatically define an order of restraints? Could we use that?

The main problem I see now with the current algorithm is that it's only possible to modify one lambda_X_name at a time so we won't be able to turn on/off multiple restraints at the same time. This could be also worked around by using something similar to AlchemicalFunction to enslave multiple lambda_restraint_names variables to a single lambda_X and trailblaze over the collective lambda_X variable.

Lnaden commented 6 years ago

Is there a way we could expose these features within this framework instead of the lambda_name keyword?

I'm not sure I am following. The multiple regions was so we could select different sets of atoms, we still have no way to control the alchemical process though the alchemical regions block? Could you provide a psuedo-example maybe?

'm probably missing something, but if we pass a list of restraints, doesn't the list automatically define an order of restraints? Could we use that?

Kind of. The larger problem is that not all transformations are monotonic, or independent. We may want to have a restraint off, turn it on, then back off again after the fact. (e.g. the RMSD + harmonic restraint. RMSD starts and ends off so we can compute the standard state correction through the harmonic alone).

The main problem I see now with the current algorithm is that it's only possible to modify one lambda_X_name at a time so we won't be able to turn on/off multiple restraints at the same time.

You could change multiple at once. Since each lambda_X would be its own entry in the protocol, then you could have multiple lines.

This could be also worked around by using something similar to AlchemicalFunction to enslave multiple...

I have used similar things to this in the past and thought applying it here. However, I would avoid this if at all possible. This adds a layer of obfuscation which is only clear when the lambda path is not step-wise, otherwise it can be very confusing to the end user to have to map "sub lambda I care about Y = fn(Master Lambda)", especially if we have multiple lambda. Since we are exposing all the lambda values, there is no need to obfuscate their values or slave them to another variable.

andrrizzi commented 6 years ago

I'm not sure I am following. The multiple regions was so we could select different sets of atoms, we still have no way to control the alchemical process though the alchemical regions block? Could you provide a psuedo-example maybe?

Yes, sorry, I meant more about the API structure more than semantics. It's not in YANK, but @adw62 and I have been working on it in openmmtools (in his fork) so there's already a prototype API for this mechanism. This is more or less what the multiple regions API looks like right now

region1 = AlchemicalRegion(name='name1')
region2 = AlchemicalRegion(name='name2', lambda_sterics=0.8)

factory = AlchemicalFactory()
# The factory sets the forces global variables to lambda_sterics_name1 and lambda_sterics_name2.
system = factory.create_alchemical_system(reference_system, alchemical_regions=[region1, region2]

alchemical_state1 = AlchemicalState(region_name='name1', lambda_sterics=0.9)
alchemical_state2 = AlchemicalState(region_name='name2', lambda_sterics=0.8)
compound_state = CompoundState(..., composable_states=[alchemical_state1, alchemical_state2])
compound_state.lambda_sterics_name1 = 0.5

what I imagined for the restraints is something like

restraint1 = Harmonic(name='name1')
restraint2 = Boresch(name='name2')

# The restraint add forces controlled by lambda_restraints_name1 and lambda_restraints_name2
restraint1.restrain_state(thermodynamic_state)
restraint2.restrain_state(thermodynamic_state)

restraint_state1 = RestraintState(name='name1', lambda_restraints=0.9)
restraint_state2 = RestraintState(name='name2', lambda_restraints=0.8)
compound_state = CompoundState(..., composable_states=[restraint_state1, restraint_state2])
compound_state.lambda_restraints_name1 = 0.5

and to enslave the lambda_restraints variables to a single one to use something like what already exist in alchemy

compound_state.lambda_restraints_name1 = LambdaFunction('lambda_X')
compound_state.lambda_restraints_name2 = LambdaFunction('1 - lambda_X')
compound_state.labmda_X = 0.6  # This changes both variables.

I already had in mind to have a single parent class for both AlchemicalState and RestraintState so this may be a good opportunity to tackle that.

I think keeping the two APIs as similar as possible will minimize the time the user will have to spend getting familiar with them.

Lnaden commented 6 years ago

Thanks for the example, that helped. I like this plan, but I still have questions which may be better resolved in person, and after I think about this some more from a design perspective. Let me think on it and get back to you.

andrrizzi commented 6 years ago

Sounds good! I'll be back in the lab in a couple of days anyway.

andrrizzi commented 5 years ago

I did some brainstorming on the states.py framework, and I think we can accommodate the system we talked about with minimal modifications in CompoundThermodynamicState provided we expose the dynamic lambda_restraints_arbitraryname in __getattr__.

I'll probably end up implementing an initial version of the common parent class of AlchemicalState and RestraintState too for testing, so we might as well take advantage of that.

andrrizzi commented 5 years ago

I'll also have to modify the radially symmetric restraint force classes in openmmtools/forces.py to be able to use global variable names other than lambda_restraints.