Closed andrrizzi closed 7 years ago
I like it!
However, to have yank.create_phases()
be able to create all the phases, all of these things have to be turned into factories that can create, for example, a new set of restraints or a new sampler for each phase. So it would probably need to look more like
sampler_factory = ModifiedHamiltonianExchangeFactory(**sampler_options)
alchemical_factory = AlchemicalFactory(**alchemical_options)
restraint_factory = HarmonicFactory(**harmonic_options)
yank = Yank(store_directory, sampler_factory, alchemical_factory, **yank_options)
yank.create_phases(thermodynamic_state, restraint_ractory, *alchemical_phases)
Maybe there's a way to avoid factory overload?
Question: do we want to attach restraints to phases and then simply perform checks to determine if the restraint is appropriate for that type of phase (i.e. raise an exception if it's not a complex), or pass only a single restraint argument for all phases and adding the restraint automatically (as we currently do)?
Does Yank
know that certain kinds of phases should be different from others? If so, it can probably determine whether or not a restraint needs to be created and added.
all of these things have to be turned into factories
I might not see all the implementative hurdles at this point, but actually I think we won't need it. For example, phases are simulated sequentially, which means that once the first phase is done we can just sampler.resume('phase2.nc')
and sampler.run()
without the need to create two separate objects. If we need multiple restraints for each phase, we'll just have the user pass 2 Restraint
objects and attach them to the AlchemicalPhase
s.
Does Yank know that certain kinds of phases should be different from others?
Yes, currently we pass the restraint type as a single option, and Yank
adds the restraints only if it's a complex (i.e. if it receptor_atoms
is not empty). I think the question we should focus is: are there cases in which we want two phases to employ two different restraints (relative free energies maybe)?
all of these things have to be turned into factories
Worst case scenario, I guess we can deepcopy
the objects that we need before using them.
If there are no other objections on this design, I'll start implementing the restraints part today.
+1
Could we finalize the proposed API as it would be used first? I'm just unclear on what the final proposal was.
The API for the restraints would become (I would discuss changes in how we pass sampler and alchemical parameters later)
yank = Yank(store_directory, **yank_alchemy_repex_options)
# Configure complex and solvent phases
complex_phase = AlchemicalPhase(..., restraint=Harmonic(..., **restraint_parameters))
solvent_phase = AlchemicalPhase(..., restraint=None)
yank.create_phases(thermodynamic_state, complex_phase, solvent_phase)
in yank.create_phases()
we raise an exception if we apply a restraint to a system where there is no receptor.
I'm not sure I follow. Would you use yank.restraints.create_restraints()
to create the restraint?
Also, what happens if we have more than two phases? Shouldn't yank.createPhases()
take a list of phases? Or will two phases be hardcoded?
I think it's worth trying to nail down a workable version of what you originally proposed
sampler = ModifiedHamiltonianExchange(**sampler_options)
factory = AlchemicalFactory(**alchemical_options)
restraint = Harmonic(**harmonic_options)
yank = Yank(store_directory, sampler, factory, **yank_options)
yank.create_phases(thermodynamic_state, restraint, *alchemical_phases)
since this could considerably simplify everything.
Did you get a chance to look at how the perses
sampler stacks are defined?
Would you use
yank.restraints.create_restraints()
to create the restraint?
The user/driver class can use that method to create the restraint (and YamlBuilder
will definitely use it), but there's no need to call that function if someone wants to run Yank from the API using a simple script.
Also, what happens if we have more than two phases? Shouldn't yank.createPhases() take a list of phases? Or will two phases be hardcoded?
The signature is def Yank.create_phases(self, thermodynamic_state, *alchemical_phases):
, so Python allows to pass the phases both as a list or as comma separated arguments. It won't be hardcoded.
I think it's worth trying to nail down a workable version of what you originally proposed. Did you get a chance to look at how the perses sampler stacks are defined?
Will take a look now, thanks! I'll be happy to make an entire API proposal before implementing the restraint part. That will take me some more thinking though to be sure everything is feasible.
The signature is def Yank.create_phases(self, thermodynamic_state, *alchemical_phases):, so Python allows to pass the phases both as a list or as comma separated arguments. It won't be hardcoded.
What if we need to add optional arguments to Yank.create_phases
. Will that be a problem?
Will take a look now, thanks! I'll be happy to make an entire API proposal before implementing the restraint part. That will take me some more thinking though to be sure everything is feasible.
I think we could probably save a whole lot of pain if we can figure out a way to restructure the setup of YANK calculations to be simpler. It's worth spending a day or so to evaluate this.
Another example of a simpler setup API is the SAMS sampler stack.
I bet you could think of ways to drastically improve on these idioms, however!
What if we need to add optional arguments to
Yank.create_phases
. Will that be a problem?
Nope! Previous code will still be compatible.
Another example of a simpler setup API is the SAMS sampler stack.
Thanks! I'll take a look at it, and work out something clearer than what I proposed.
That won't work. Samplers are bound to a specific storage file, and take a specific system as input. Restraints are added to specific atoms. Even if we don't call them factories, we will have to turn them into factories that take very general configuration arguments in their constructors and then generate specific instances adapted to the system and storage file.
It isn't a bad way to do things, though. The perses and sams projects use lots of factories, but not everything is a factory. I actually like the designs for perses and sams much better, but not sure if it scales to a full fledged system like yank. Maybe you can poke around perses/tests/testsystems.py and sams/tests/testsystems.py to get an idea of that model and see what you think?
In the design above, we pass to Yank
a separate restraint object per phase, so we wouldn't need to create new specific instances with factories, the user/driver class is supposed to do that.
take very general configuration arguments in their constructors
That's exactly the idea! We should use samplers/alchemical factory constructors only for general configuration, then Yank
is responsible to use them for a specific system. Note that we don't necessarily need to use a factory design, although we might anyway. It's just a matter to add the support for a Sampler
object to switch storage file and system (and in general anything that is not the general configuration passed through the constructor). I think the two approaches are in practice equivalent, but I personally think the second one is more elegant.
I still have to check the details of the feasibility though, so maybe there's still some obstacle I don't see. I think this will be clearer when I'll post the full proposal.
Here is the proposal, sorry for the wall of text. I've made bold important points. Let me know what you think and if you find cases that are not covered by these abstractions.
Similarly to OpenMM, I think we should make a clear distinction between a library layer, and an application layer. The library layer should be general and very extensible if we want people to use Yank to experiment with different methods (us included), in the application layer we can hide as many details as we want to encapsulate current best practices. Currently the Yank
class attempts to do both, but I don't think this is sustainable. So I think we should have something like
AlchemicalPhase
AlchemicalPhase
class.
BindingFreeEnergy
HydrationFreeEnergy
SolvationFreeEnergy
TransferFreeEnergy
Main differences with what we have:
Sampler
s implementations should be how to sample, and it should be independent on what to sample (handled by State
s), and how to store samples (handled by Reporter
s). This should decouple the sampler implementation from the particular application. The Sampler
abstraction needs to be general enough to accommodate replica exchange, SAMS and any user-defined sampler!Sampler
s since they are not integrated, but to AlchemicalPhaseReporter
which is responsible to compute the energies for states that are not sampled.Topography
abstractions that can be used to represent interesting areas of our system (binding site, receptor atoms, alchemical atoms etc.)An example of how using the library layer would look like
# Configure phase
storage = Storage('path/to/file')
analyzer = AlchemicalPhaseAnalyzer(store_volumes=False, sanity_check_every=500, storage=storage)
sampler = MySampler(mc_moves=True, displacement_sigma=1.0*unit.nanometers,
time_step=1.0*unit.femtoseconds, n_steps_per_iteration=500)
factory = AlchemicalFactory(softcore_beta=0.0, annihilate_electrostatics=False)
phase = AlchemicalPhase(factory, sampler, analyzer)
# Prepare initialization
prmtop_file = openmm.app.AmberPrmtopFile('topology.prmtop')
system = prmtop_file.createSystem()
topology = prmtop_file.topology
positions = openmm.app.AmberInpcrdFile('positions.inpcrd').positions
state = ThermodynamicState(system=system, temperature=300*unit.kelvin, pressure=1.0*unit.atmosphere)
sampler_state = SamplerState(positions=positions)
topography = ComplexTopography(topology=topology, ligand_dsl='resname MOL')
restraint = Harmonic(k=1.0)
protocol = AlchemicalState.from_lambda_values(lambda_electrostatics=[1.0, 0.5, 0.0, 0.0, 0.0],
lambda_sterics=[1.0, 1.0, 1.0, 0.5, 0.0])
# Initialize phase and run
phase.initialize(state, restraint, protocol, topography, storage)
phase.run(1000)
analyzer.analyze()
An example of a simple implementation for for a generic MultiStateSampler
exploiting the State
abstraction (see next section)
class MySampler(MultiStateSampler):
def run_iterations(self, n_iterations):
for n_iteration in range(n_iterations):
for state_id in range(len(self._thermodynamic_states)):
thermodynamic_state = self._thermodynamic_states[state_id]
sampler_state = self._sampler_states[state_id]
if thermodynamic_state.is_context_compatible(self._context):
thermodynamic_state.apply_to_context(self._context)
else:
del self._context
self._context = thermodynamic_state.create_context(self._integrator, self._platform)
sampler_state.apply_to_context(self._context)
self._integrator.step(self._n_steps_per_iterations)
sampler_state.update(self._context)
# Do sampler stuff (e.g. exchange thermodynamic states, propose new thermodynamic states
# using a proposal engine, report mixing statistics to user)
self._reporter.report(self._thermodynamic_states, self._sampler_states)
I wrote a set of abstractions that I think we should use as dependency for our objects. I didn't have time to think of a general API for analysis and storage but
Analyzer
can be just an extension of a Reporter
since it knows how the info is storedStorage
there are nice examples in perses
and openpathsampling
.# =============================================================================
# Storage
# =============================================================================
class Storage:
"""TODO: High-level storage interface that decouple representation from format.
See netcdfplus package in openpathsampling, or storage module in perses for
inspirational examples.
"""
class Storable:
"""Represent any object that can be saved into a storage."""
def storage(self):
"""The storage associated to this object."""
def is_stored(self):
"""True if the associated storage contains a complete representation of
the object."""
def from_storage(self, storage):
"""Constructor. Restore a previously saved object into memory."""
def save(self):
"""Save the object into its associated storage."""
# =============================================================================
# States
# =============================================================================
class State:
"""Generally represent the state, or a portion of the state, of an OpenMM
Context.
Logically, a State is any object that can be applied to a Context.
"""
def apply_to_context(self, context):
"""Set the parameters of the given context to be consistent with this
state.
Raises
------
StateException
If is_context_compatible(context) is False.
"""
def is_context_compatible(self, context):
"""Return True if the given context can be set to this state.
False, generally means that the thermodynamic state of the context is
incompatible (e.g. different ensemble, different number of particles).
"""
class SamplerState(State):
"""Represent the portion of the state of an OpenMM Context that changes
with integration.
SamplerStates provide an update_from_context method that allow the
application to keep this state up-to-date after integration. It guarantees
to consistently set box vectors, and particles positions and velocities
of the context.
"""
def positions(self):
def velocities(self):
def box_vectors(self):
def update_from_context(self, context):
"""Update this state from the given context."""
class ThermodynamicState(State):
"""Completely define the thermodynamic state of an OpenMM Context.
ThermodynamicStates is responsible to store quantities that do not change
with integration. They are also associated to a particular System object,
and they are responsible to create new OpenMM Context objects when provided
with an integrator.
TODO: will we need/able to extend this formalism to micro/grand canonical
ensembles?
"""
def system(self):
def temperature(self):
def volume(self):
"""The volume of the system. None, if the volume fluctuates."""
def pressure(self):
"""The pressure of the system. None, if the pressure fluctuates."""
def create_context(self, integrator, platform):
"""Return a new context in this thermodynamic state."""
def is_state_compatible(self, thermodynamic_state):
"""Return True if contexts created by the given ThermodynamicState
can be converted to this state.
False, implies that a new Context will have to be created to sample
this particular state. This should be functionally equivalent to the
method is_context_compatible with the exception that you don't have
to create a Context object to test compatibility.
"""
def reduced_potential(self, context):
"""Return the reduced potential for the context."""
def reduced_potential_from_state(self, sampler_state):
"""Return the reduced potential for the sampler state."""
# =============================================================================
# Reporters
# =============================================================================
class MultiStateReporter:
"""A reporter for multiple thermodynamic state simulations."""
def report(self, thermodynamic_states, sampler_states):
"""Save the interesting information extracted from the given states.
This method is called at every iteration by a MultiStateSampler, and it
is responsible to sparsify the reports. It could also show sanity check
reports for the user at regular intervals or at particular events of the
simulation.
"""
class AlchemicalPhaseReporter(MultiStateReporter):
"""Report also information about unsampled thermodynamic states."""
def unsampled_states(self):
"""A list of ThermodynamicStates that are not sampled."""
class AlchemicalPhaseAnalyzer(AlchemicalPhaseReporter):
"""TODO: A reporter capable of performing analysis on the stored data."""
# =============================================================================
# Sampler
# =============================================================================
class MultiStateSampler:
"""Represent an object capable of sampling multiple thermodynamic states."""
def is_initialized(self):
pass
def initialize(self, thermodynamic_states, sampler_states, reporter):
"""Set the initial ThermodynamicStates and SamplerStates to sample.
Parameters
----------
thermodynamic_states : iterable of ThermodynamicStates
Thermodynamic states of the systems to sample (or initial
thermodynamic states if they are sampled too).
sampler_states : iterable of SamplerStates
Initial sampler states for the given thermodynamic states. Must
have the same dimension of thermodynamic_states. These states
are updated with the new Context state at each iteration.
reporter : MultiStateReporter
The reporter used to store the samples. Its method report is
called at every iteration. It is the reporter's responsibility
to sparsify the reports.
"""
def run_iterations(self, n_iterations):
"""Run a fixed number of iterations."""
def run(self, equilibration_detector):
"""IN THE FUTURE: Run until the equilibration detector determines
that enough samples have been collected."""
# =============================================================================
# System regions
# =============================================================================
class SystemRegion:
"""Represent a subset of atoms, bonds, angles and dihedral of a system."""
def topology(self):
"""The topology for this SystemRegion."""
def atoms(self):
"""The indices of the system particles included in this region."""
def bonds(self):
"""The indices of the system bonds included in this region."""
def angles(self):
"""The indices of the system angles included in this region."""
def torsions(self):
"""The indices of the system angles included in this region."""
def add_group(self, dsl_expression):
"""Add atoms, bonds, angles and torsions associated to the atoms
selected by the DSL expression."""
def is_topology_compatible(self, topology):
"""True if this region refers to the same topology."""
class Topography(SystemRegion):
"""An annotated SystemRegion."""
def solvent_region(self):
"""Return the portion of the region that belongs to the solvent."""
def alchemical_region(self):
"""Return the portion of the region that needs to be alchemically modified."""
class ComplexTopography(Topography):
"""Topography of a ligand-receptor complex."""
def receptor_region(self):
def ligand_region(self):
def bindind_site(self):
class SoluteTopography(Topography):
"""Topography of a solute."""
def solute_region(self):
class RelativeTopography(Topography):
"""Topography for systems for relative free energy calculations."""
def molecule1_region(self):
"""Return the portion of the region that belongs to the first molecule."""
def molecule2_region(self):
"""Return the portion of the region that belongs to the second molecule."""
# =============================================================================
# Restraint
# =============================================================================
class Restraint:
def standard_state_correction(self, thermodynamic_state):
"""Get the standard state correction for the given thermodynamic state."""
def restrain_state(self, thermodynamic_state, complex_topography=None, sampler_state=None):
"""Add the restraint Force (or Forces) to the thermodynamic state's system.
Notes
-----
Parameters that are internally undefined can be determined automatically
if sampler_state is not None. Same for restrained atoms if they are
internally undefined and complex_topography is not None.
"""
# =============================================================================
# Alchemical factory
# =============================================================================
class AlchemicalFactory:
def create_alchemical_system(self, reference_state, system_region):
"""Return an alchemically perturbed OpenMM System."""
# =============================================================================
# Alchemical phase
# =============================================================================
class AlchemicalPhase:
"""General alchemical free energy calculation of a single thermodynamic leg."""
def __init__(self, alchemical_factory, sampler, analyzer):
def initialize(self, reference_state, restraint, alchemical_states, topography, storage):
"""Create the alchemical system and initialize the sampler"""
def run_iterations(self, n_iterations):
@andrrizzi, this looks great! Can we find some time to discuss Wed or Thu?
Just elaborating on the high-level API with a few minor changes.
perses
, with an MCMCSampler
at its base? If so, we could separate our current sampling scheme into a GHMC
sampler block followed by ligand Monte Carlo rotation and translation blocks. We could clean up the already written mcmc
module to do this. The downside is that this approach incurs overhead of creating/destroying a Context
every iteration, and I'm not sure what the current overhead on this is. On the other hand, it makes a lot of things very simple, including the use of scaled charges to experiment with alternative forms of alchemical intermediates (that don't use the current approach to PME).Analyzer(Reporter)
for a single AlchemicalPhase
for the sake of flexibility. Then, for example, our application classes like BindingFreeEnergy
can expose a BindingFreeEnergy.analyze()
method that knows how to mix correctly the results from the two Analyzer
s.Yank
does it right now, but with less freedom. If the user wants more freedom, I thought he could setup the simulation with two AlchemicalPhase
. Something similar to (details to be defined)# Configure complex phase
storage = Storage('path/to/complex_file.nc')
analyzer = AlchemicalPhaseAnalyzer(store_volumes=False, sanity_check_every=500, storage=storage)
sampler = MySampler(integrator=Langevin(temperature, ...), mc_moves=True, displacement_sigma=1.0*unit.nanometers,
time_step=1.0*unit.femtoseconds, n_steps_per_iteration=500)
factory = AlchemicalFactory(softcore_beta=0.0, annihilate_electrostatics=False)
alchemical_phase = AlchemicalPhase(factory, sampler, analyzer)
# Initialize phase and run complex phase
state = ThermodynamicState(system=system_complex, temperature=300*unit.kelvin, pressure=1.0*unit.atmosphere)
sampler_state = SamplerState(positions=complex_inpcrd.positions)
topography = Topography(topology=complex_prmtop.topology, ligand_dsl='resname MOL')
restraint = Harmonic(k=1.0)
protocol = AlchemicalState.from_lambda_values(lambda_electrostatics=[1.0, 0.5, 0.0, 0.0, 0.0],
lambda_sterics=[1.0, 1.0, 1.0, 0.5, 0.0])
phase.initialize(state, restraint, protocol, topography, storage)
phase.run(1000)
free_energy_complex = analyzer.analyze()
# Update configuration for phase 2, we can recycle old configuration since we are going to reinitialize the AlchemicalPhase, or we can instantiate a whole new AlchemicalPhase object.
analyzer.storage = Storage('path/to/solvent_file.nc')
# Initialize phase and run complex phase
state.system = ThermodynamicState(system=system_solvent, temperature=300*unit.kelvin, pressure=1.0*unit.atmosphere)
sampler_state = SamplerState(positions=solvent_inpcrd.positions)
topography = Topography(topology=complex_prmtop.topology, ligand_dsl='resname MOL')
restraint = None
phase.initialize(state, restraint, protocol, topography, storage)
phase.run(1000)
free_energy_solvent = analyzer.analyze()
print(free_energy_solvent - free_energy_complex)
AlchemicalPhase
doesn't care about the internal implementation, so as long as the Sampler
object that we are passing has a method that takes two lists of ThermodynamicState
s and SamplerState
s, we are good. The sampler can take a list of different MCMCMove
s or simply a list of different Integrator
s in the constructor. I wouldn't constrain this if we can avoid it.Massive API overhaul that will be part of YANK 1.0 release, but not the preview
The expanded state(s) are not passed to Samplers since they are not integrated,
But in the case of SAMS, aren't we integrating both? The MCMC sampler targets p(x, lambda)
.
But in the case of SAMS, aren't we integrating both?
Sorry for the confusion, by "expanded state" here I mean the system with an enlarged cutoff that we use to reweight our samples to correct for anisotropic long-range dispersion interactions.
Oh, sorry, totally confused that with the expanded ensemble concept. Nevermind then...
This was implemented in several different PRs here and in openmmtools
that were released with YANK 0.16.
Since all the relevant objects are initialized inside
Yank
(i.e. restraint, alchemical factory, sampling class), we need to pass all the parameters for all classes in a bigkwargs
dictionary. ThenYank
has to use introspection to separate them by class to initialize all the objects correctly. I propose to make an effort to move the initialization of these objects outsideYank
.From a user perspective, the final interface would become something like
This is of course only to give an idea, not an actual proposal, but I would implement the restraint part this way now, if we agree that the API should move towards this direction before 1.0 release. I realize that we may need to change a lot of things for this to happen, but I really believe this will turn out to be much easier to maintain and extend in the future.
Question: do we want to attach restraints to phases and then simply perform checks to determine if the restraint is appropriate for that type of phase (i.e. raise an exception if it's not a complex), or pass only a single
restraint
argument for all phases and adding the restraint automatically (as we currently do)?