choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
181 stars 51 forks source link

A->B and B->A Transforms are Both Positive #958

Closed casebeerVRTX closed 2 years ago

casebeerVRTX commented 2 years ago

We are trying to troubleshoot some issues using perses on some ligand transformations. We're testing a single forward and backward transformation ( A-> B and B -> A) which is the most trivial closed cycle. Our intuition tells us that the magnitudes of these two transformations should be equal and opposite to each other, and this is what we see with the experimental data. Right now, the calculated ddGs are not adding up to zero, but instead are both very positive. Can you help shed some light on this? Could this be a units problem? This is what the mm_graph looks like:

OutEdgeDataView([(9, 10, {'calc_DDG': 5.18712727908637, 'calc_dDDG': 0.4801874381154462, 'exp_DDG': -0.9193053246474641, 'exp_dDDG': 0.0}), (10, 9, {'calc_DDG': 11.124917041483098, 'calc_dDDG': 0.4120190004511572, 'exp_DDG': 0.9193053246474641, 'exp_dDDG': 0.0})])

Unknown-19

jchodera commented 2 years ago

There are a number of things that could be going wrong here, including apparently good convergence that is actually poor.

We're not yet to the advanced state with perses as we were with YANK in being able to automatically generate "simulation health reports" that make it easy to diagnose issues without disclosing chemical information about the transformation, so we'll need to ask you to find a public example that you can share your input files for. Can you upload a ZIP file and also provide a conda env export dump of your conda environment?

Our funding for this project is from other partners that we have to prioritize, but we try to give a "best effort" in helping other folks out when we can.

casebeerVRTX commented 2 years ago

Thank you! We are working on putting something together.

jchodera commented 2 years ago

Awesome---that will help immensely.

jchodera commented 2 years ago

@mikemhenry @ijpulidos : We should add some A->B and B->A consistency tests (likely just ligand in solvent, not complex) to our GPU CI as well. We can start with examples we already have, but can add a public example from @casebeerVRTX once provided.

casebeerVRTX commented 2 years ago

We appreciate the help and we have been trying on some public data so it will be easier to debug. However, we've made a few modifications to perses.

To run perses on membrane proteins, we worked with @dominicrufa last summer. We prepared the protein + membrane system outside of perses and then used modeller to add waters and ions. After this, we saved a pickle of the topology and numpy array of the positions.

The examples provided are from the following paper, a Nav 1.7 protein with Arylsulfonamide inhibitors.

from simtk.openmm.app import PDBFile
from simtk.openmm import app
from simtk import unit
from simtk.openmm import Platform
from simtk.openmm import LangevinIntegrator
from simtk.openmm.app import ForceField, PDBFile, PME, Modeller
# This is receptor + membrane
pdb = PDBFile("tmp2.pdb")
modeller = Modeller(pdb.topology, pdb.positions)
forcefield = ForceField(*['amber/ff14SB.xml', 'amber/tip3p_standard.xml', 'amber/lipid17.xml'])
# This works with the included POPC
system = forcefield.createSystem(pdb.getTopology(), nonbondedMethod=PME)
modeller.addSolvent(forcefield, model='tip3p', numAdded=65000,
ionicStrength=0.150*unit.molar, positiveIon='Na+', negativeIon='Cl-', neutralize=True)
def write_pickle(object, pickle_filename):
"""
write a pickle
arguments
object : object
picklable object
pickle_filename : str
name of pickle
"""
import pickle
with open(pickle_filename, 'wb') as f:
pickle.dump(object, f)
write_pickle(modeller.topology, "openmm.pickle")
_positions = modeller.positions / unit.nanometers
import numpy as np
np.savez("positions.z", positions=_positions)

We then modified relative_setup to omit solvation and added the class attributes for receptor_topology and receptor_positions to the RelativeFEPSetup class. If those are defined, we depickle the files we prepared outside of perses and set self._receptor_positions_old, self._receptor_topology_old, and self._receptor_md_topology_old.

def depickle(pickle_filename):
"""
load a pickle
arguments
pickle_filename : str
name of pickle
returns
pickle : loaded pickle object
"""

import pickle
with open(pickle_filename, 'rb') as f:
pickle = pickle.load(f)
return pickle

Inside of setup_relative_calculation.py we added an option to use the Monte Carlo barostat. We also added options to directly pass the receptor topology and receptor positions to RelativeFEPSetup. (We have not merged in new commits to perses so we are out of date ~6 months.) In run.yaml we're loading the Lipid17 FF to handle POPC. Other than that, I think things are fairly standard. Attached are the conda environment, receptor PDB, ligand sdf, pickle, numpy positions, yaml file, and run.py.

2edge_test.zip env_2edge.txt

slochower commented 2 years ago

To make things a little more concrete and fix some copy/paste indentation issues, here is a diff between perses commit ac5cd56e138383b557ecfd5d0fc780bef193ec0d and our modified version.

```diff diff --git a/perses/app/relative_setup.py b/perses/app/relative_setup.py index bb594799..a0232675 100644 --- a/perses/app/relative_setup.py +++ b/perses/app/relative_setup.py @@ -49,6 +49,8 @@ class RelativeFEPSetup(object): forcefield_files, phases, protein_pdb_filename=None, + receptor_topology=None, + receptor_positions=None, receptor_mol2_filename=None, pressure=1.0 * unit.atmosphere, temperature=300.0 * unit.kelvin, @@ -376,10 +378,11 @@ class RelativeFEPSetup(object): _logger.info(f"setting up complex phase...") self._setup_complex_phase(protein_pdb_filename,receptor_mol2_filename,mol_list) self._complex_topology_old_solvated, self._complex_positions_old_solvated, self._complex_system_old_solvated = self._solvate_system( - self._complex_topology_old, self._complex_positions_old,phase='complex',box_dimensions=self._complex_box_dimensions, ionic_strength=self._ionic_strength) + self._complex_topology_old, self._complex_positions_old,phase='complex',box_dimensions=self._complex_box_dimensions, ionic_strength=self._ionic_strength, run_solvate=False) _logger.info(f"successfully generated complex topology, positions, system") self._complex_md_topology_old_solvated = md.Topology.from_openmm(self._complex_topology_old_solvated) + print(f"dumbwaiter found {self._complex_md_topology_old_solvated.n_atoms} atoms!") _logger.info(f"creating TopologyProposal...") self._complex_topology_proposal = self._proposal_engine.propose(self._complex_system_old_solvated, @@ -557,8 +560,36 @@ class RelativeFEPSetup(object): protein_pdbfile.close() self._receptor_positions_old = pdb_file.positions self._receptor_topology_old = pdb_file.topology + print(f"from app.pdbfile, topology got pbcs like: {pdb_file.topology.getPeriodicBoxVectors()}") self._receptor_md_topology_old = md.Topology.from_openmm(self._receptor_topology_old) + if self.receptor_topology: + + def depickle(pickle_filename): + """ + load a pickle + arguments + pickle_filename : str + name of pickle + returns + pickle : loaded pickle object + """ + import pickle + + + with open(pickle_filename, 'rb') as f: + pickle = pickle.load(f) + return pickle + + depickled = depickle(self.receptor_topology) + f = np.load(self.receptor_positions) + + self._receptor_positions_old = f["positions"] * unit.nanometers + self._receptor_topology_old = depickled + print(f"from app.pdbfile, topology got pbcs like: {depickled.getPeriodicBoxVectors()}") + self._receptor_md_topology_old = md.Topology.from_openmm(self._receptor_topology_old) + + elif receptor_mol2_filename: self._receptor_mol2_filename = receptor_mol2_filename self._receptor_mol = createOEMolFromSDF(self._receptor_mol2_filename) @@ -578,6 +609,7 @@ class RelativeFEPSetup(object): self._complex_md_topology_old = self._complex_md_topology_old.join(spectator_topology) n_atoms_spectators += spectator_topology.n_atoms self._complex_topology_old = self._complex_md_topology_old.to_openmm() + self._complex_topology_old.setPeriodicBoxVectors(depickled.topology.getPeriodicBoxVectors()) n_atoms_total_old = self._complex_topology_old.getNumAtoms() n_atoms_protein_old = self._receptor_topology_old.getNumAtoms() @@ -778,7 +810,7 @@ class RelativeFEPSetup(object): pass return new_positions * unit.nanometers - def _solvate_system(self, topology, positions, model='tip3p',phase='complex', box_dimensions=None,ionic_strength=0.15 * unit.molar): + def _solvate_system(self, topology, positions, model='tip3p',phase='complex', box_dimensions=None,ionic_strength=0.15 * unit.molar, run_solvate=True): """ Generate a solvated topology, positions, and system for a given input topology and positions. For generating the system, the forcefield files provided in the constructor will be used. @@ -795,6 +827,8 @@ class RelativeFEPSetup(object): solvent model to use for solvation box_dimensions : tuple of Vec3, default None if not None, padding distance will be omitted in favor of a pre-specified set of box dimensions + run_solvate : bool, default True + whether to solvate the existing modeller.model or not. Returns ------- @@ -808,13 +842,13 @@ class RelativeFEPSetup(object): # DEBUG: Write PDB file being fed into Modeller to check why MOL isn't being matched from simtk.openmm.app import PDBFile modeller = app.Modeller(topology, positions) + print(f"input topology of phase {phase} has box vectors: {topology.getPeriodicBoxVectors()}") # retaining protein protonation from input files #hs = [atom for atom in modeller.topology.atoms() if atom.element.symbol in ['H'] and atom.residue.name not in ['MOL','OLD','NEW']] #modeller.delete(hs) #modeller.addHydrogens(forcefield=self._system_generator.forcefield) _logger.info(f'box_dimensions: {box_dimensions}') _logger.info(f'solvent padding: {self._padding._value}') - run_solvate = True if phase == 'solvent': self._padding = 9. * unit.angstrom if phase == 'vacuum': @@ -829,16 +863,26 @@ class RelativeFEPSetup(object): modeller.addSolvent(self._system_generator.forcefield, model=model, padding=self._padding, ionicStrength=ionic_strength) else: modeller.addSolvent(self._system_generator.forcefield, model=model, ionicStrength=ionic_strength, boxSize=box_dimensions) + solvated_topology = modeller.getTopology() - if phase == 'complex' and self._padding._value == 0. and box_dimensions is not None: - _logger.info(f'Complex phase, where padding is set to 0. and box dimensions are provided so setting unit cell dimensions') - solvated_topology.setUnitCellDimensions(box_dimensions) + if not run_solvate: + solvated_topology.setPeriodicBoxVectors(topology.getPeriodicBoxVectors()) + print(f"pulled solvated topology with phase {phase} and got pbcs like: {solvated_topology.getPeriodicBoxVectors()}") + + if run_solvate: + if phase == 'complex' and self._padding._value == 0. and box_dimensions is not None: + _logger.info(f'Complex phase, where padding is set to 0. and box dimensions are provided so setting unit cell dimensions') + solvated_topology.setUnitCellDimensions(box_dimensions) + else: + _logger.info(f"complex phase is pre-solvated: getting cell dimensions {solvated_topology.getUnitCellDimensions()}") + solvated_positions = modeller.getPositions() # canonicalize the solvated positions: turn tuples into np.array solvated_positions = unit.quantity.Quantity(value = np.array([list(atom_pos) for atom_pos in solvated_positions.value_in_unit_system(unit.md_unit_system)]), unit = unit.nanometers) _logger.info(f"\tparameterizing...") solvated_system = self._system_generator.create_system(solvated_topology) + print(f"solvated system uses pbcs: {solvated_system.usesPeriodicBoundaryConditions()}") _logger.info(f"\tSystem parameterized") if self._trajectory_directory is not None and self._trajectory_prefix is not None: @@ -847,6 +891,8 @@ class RelativeFEPSetup(object): PDBFile.writeFile(solvated_topology, solvated_positions, outfile) else: _logger.info('Both trajectory_directory and trajectory_prefix need to be provided to save .pdb') + + #raise Exception(f"terminating...") return solvated_topology, solvated_positions, solvated_system diff --git a/perses/app/setup_relative_calculation.py b/perses/app/setup_relative_calculation.py index 350c3b24..208ec060 100644 --- a/perses/app/setup_relative_calculation.py +++ b/perses/app/setup_relative_calculation.py @@ -335,8 +335,11 @@ def run_setup(setup_options, serialize_systems=True, build_samplers=True): _logger.info(f"\tPulling receptor (as pdb or mol2)...") # We'll need the protein PDB file (without missing atoms) try: - protein_pdb_filename = setup_options['protein_pdb'] - assert protein_pdb_filename is not None + receptor_topology = setup_options['receptor_topology'] + receptor_positions = setup_options["receptor_positions"] + assert receptor_topology is not None + assert receptor_positions is not None + protein_pdb_file = None receptor_mol2 = None except KeyError: try: @@ -438,6 +441,8 @@ def run_setup(setup_options, serialize_systems=True, build_samplers=True): _logger.info(f'Box dimensions: {setup_options["complex_box_dimensions"]} and {setup_options["solvent_box_dimensions"]}') fe_setup = RelativeFEPSetup(ligand_file, old_ligand_index, new_ligand_index, forcefield_files,phases=phases, protein_pdb_filename=protein_pdb_filename, + receptor_topology=recptor_topology, + receptor_positions=receptor_positions, receptor_mol2_filename=receptor_mol2, pressure=pressure, temperature=temperature, solvent_padding=solvent_padding_angstroms, spectator_filenames=setup_options['spectators'], map_strength=setup_options['map_strength'], @@ -565,6 +570,25 @@ def run_setup(setup_options, serialize_systems=True, build_samplers=True): interpolate_old_and_new_14s = setup_options['anneal_1,4s'], rmsd_restraint=setup_options['rmsd_restraint'] ) + + # replace the htf with a membrane barostat + if phase == 'complex': + from simtk import openmm + _logger.info("querying hybrid system forces to add barostat...") + hsystem = htf[phase].hybrid_system + _logger.info(f"hybrid system forces are {hsystem.getForces()}") + found_bar = False + for force_idx in range(hsystem.getNumForces()): + _logger.info(f"querying force index {force_idx} with force {hsystem.getForce(force_idx)} and name {hsystem.getForce(force_idx).__class__.__name__}") + if hsystem.getForce(force_idx).__class__.__name__ == 'MonteCarloBarostat': + hsystem.removeForce(force_idx) + found_bar=True + break + assert found_bar, f"no MonteCarloBarostat was found." + barostat = openmm.MonteCarloMembraneBarostat((setup_options['pressure'] * unit.atmosphere).value_in_unit(unit.bar), 0., temperature.value_in_unit(unit.kelvin), openmm.MonteCarloMembraneBarostat.XYAnisotropic, openmm.MonteCarloMembraneBarostat.ZFixed, 25) + htf[phase].hybrid_system.addForce(barostat) + + for phase in phases: # Define necessary vars to check energy bookkeeping diff --git a/perses/rjmc/geometry.py b/perses/rjmc/geometry.py index 5a4563eb..eea17ee0 100644 --- a/perses/rjmc/geometry.py +++ b/perses/rjmc/geometry.py @@ -460,6 +460,7 @@ class FFAllAngleGeometryEngine(GeometryEngine): integrator = openmm.VerletIntegrator(1*unit.femtoseconds) atoms_with_positions_system_integrator = openmm.VerletIntegrator(1*unit.femtoseconds) final_system_integrator = openmm.VerletIntegrator(1*unit.femtoseconds) + final_system_integrator2 = openmm.VerletIntegrator(1*unit.femtoseconds) context = openmm.Context(growth_system, integrator, platform) growth_system_generator.set_growth_parameter_index(len(atom_proposal_order)+1, context) @@ -490,7 +491,7 @@ class FFAllAngleGeometryEngine(GeometryEngine): _logger.debug(f'\t{f} : {e}') atoms_with_positions_methods_differences = abs(atoms_with_positions_reduced_potential - sum([i[1] for i in atoms_with_positions_reduced_potential_components])) _logger.debug(f'Diffence in energy on adding unique atoms: {atoms_with_positions_methods_differences}') - assert atoms_with_positions_methods_differences < ENERGY_THRESHOLD, f"the difference between the atoms_with_positions_reduced_potential and the sum of atoms_with_positions_reduced_potential_components is {atoms_with_positions_methods_differences}" + #assert atoms_with_positions_methods_differences < ENERGY_THRESHOLD, f"the difference between the atoms_with_positions_reduced_potential and the sum of atoms_with_positions_reduced_potential_components is {atoms_with_positions_methods_differences}" # Place each atom in predetermined order _logger.info("There are {} new atoms".format(len(atom_proposal_order))) @@ -639,6 +640,7 @@ class FFAllAngleGeometryEngine(GeometryEngine): final_system.getForce(force_names['NonbondedForce']).setUseDispersionCorrection(False) _logger.info(f"{direction} final system defined with nonbonded interactions.") final_context = openmm.Context(final_system, final_system_integrator, platform) + final_context2 = openmm.Context(final_system, final_system_integrator2, platform) final_context.setPositions(_positions) state = final_context.getState(getEnergy=True) diff --git a/perses/samplers/multistate.py b/perses/samplers/multistate.py index c5ef0800..c8eca3bd 100644 --- a/perses/samplers/multistate.py +++ b/perses/samplers/multistate.py @@ -48,7 +48,8 @@ class HybridCompatibilityMixin(object): thermodynamic_state_list = [] sampler_state_list = [] - context_cache = cache.ContextCache() + context_cache = cache.global_context_cache #why was this not specified like this before? + print("context cache is using platform: {context_cache.platform.getName()}") if n_replicas is None: _logger.info(f'n_replicas not defined, setting to match n_states, {n_states}') ```
casebeerVRTX commented 2 years ago

Hello, We have continued to try to debug this issue, and unfortunately have not had much luck. We have tried changing the number of REPEX cycles, the number of MD steps, the number of lambda windows, adding a backbone restraint, plotting replica mixing, doing a visual trajectory analysis, and examining the DDG as a function of time.

I am still working with the protein and ligands specified in the sdf file that I sent earlier. My expectation is that any two reciprocal transformations would sum to 0. I’m looking at a particular transformation (9->10 and 10->9), one of many that does not sum to zero.

Here is a summary of a bunch of things I tried:

Screen Shot 2022-03-21 at 3 53 24 PM

This is the 10 -> 9 transformation.

lig9to10

It appears as if none of these changes resulted in significant progress towards getting the two transformations to sum to zero. Below is a graph of all the DDG results along with the experimental values on the far right.

image

Looking more closely at simulation 9, which has 1000 REPEX cycles, 1000 MD steps, 4 fs timesteps, and 11 lambda windows, meaning 44ns of simulation time per transformation, I plotted the replica mixing and the DDG over time. Here is the replica mixing (complex on left and solvent on right) for 10 -> 9.

image image

Here is the replica mixing (complex on left and solvent on right) for 9 -> 10.

image image

Qualitatively, they seem to be mixing.

Here is a plot of the ddG over time of the 10 -> 9 and the 9 -> 10 transformation. Again, qualitatively, it seems to be close to equilibrium.

image image

We also tried to run this protein without the lipids (just in solvent) using the non-lipid modified perses. Even though we would not expect the predicted values to match the experimental values because there’s lipids near the binding site, we wanted to test if the two reciprocal transformations would sum to 0. At the moment, we are getting the following error:

AssertionError: the difference between the atoms_with_positions_reduced_potential and the sum of atoms_with_positions_reduced_potential_components is 38.51413189283903

I realize that debugging the lipid-modified perses may pose some challenges, but I was hoping you might be able to suggest what else to check as we’re working on this issue.

casebeerVRTX commented 2 years ago

I wanted to follow up with this issue since it is now resolved. We ended up rebasing the lipid-version of perses to the most recent version, and changed the atom mapping strategy because the same ring was getting mapped incorrectly. This doesn't solve the issue of both of these being positive, but we were able to find a work around.

Thank you!