SSAGESLabs / PySAGES

Python Suite for Advanced General Ensemble Simulations
Other
75 stars 26 forks source link

Umbrella Sampling with OpenMM: Excessive Runtime? #100

Closed meschw04 closed 2 years ago

meschw04 commented 2 years ago

Hello! Thanks so much for the answers in issue #99, gonna go ahead and close that issue for the time being (I ended up using daiquiri with the OpenMM example, happy to share if that'd be helpful to you all). I also took @InnocentBug's suggestion to try umbrella sampling with the ADP example in OpenMM. I wrote the code shown below. My understanding is that this should run five umbrellas in OpenMM over 1e5 time steps (after an initial burn-in), then use WHAM to stitch these together to provide the A matrix. Looking at some other examples, the constants I set below in terms of the torsional angles and the umbrella k constant all seem reasonable. If I run just a single umbrella in OpenMM without using pysages (by adding, for eg, bias_torsion_phi = CustomTorsionForce("0.5*k_phi*dtheta^2; dtheta = min(tmp, 2*pi-tmp); tmp = abs(theta - phi)") ), and all the exact same code as below but without pysages, then the simulation completes in, like, 4 seconds.

I started the script below running on a single core yesterday morning, and it finished this morning. I'm confused as to what is taking such a high computational overhead. I have tried changing the k values, start/end locations of (psi,phi), num_umbrellas, etc., then I run for an hour before killing it. It really shouldn't take an hour, right? It should take, what, ~30 sec?

On the implementation side of things (and I'd be happy to help with this), I think it would be really nice to have one or some of the following:

Thanks so much! :smile:

from pysages.collective_variables import DihedralAngle
from pysages.methods import ABF, UmbrellaIntegration
import numpy as np
from pysages.utils import try_import
from openmm import *
from openmm.app import *

import numpy
import pysages

openmm = try_import("openmm", "simtk.openmm")
unit = try_import("openmm.unit", "simtk.unit")
app = try_import("openmm.app", "simtk.openmm.app")

pi = numpy.pi

def generate_simulation(**kwargs):
    pdb_filename = "alanine-dipeptide-explicit.pdb"
    T = 298.15 * unit.kelvin
    dt = 2.0 * unit.femtoseconds
    pdb = app.PDBFile(pdb_filename)

    ff = app.ForceField("amber99sb.xml", "tip3p.xml")
    cutoff_distance = 1.0 * unit.nanometer
    topology = pdb.topology
    system = ff.createSystem(
        topology, constraints = app.HBonds, nonbondedMethod = app.NoCutoff,
        nonbondedCutoff = cutoff_distance
    )

    positions = pdb.getPositions(asNumpy = True)

    integrator = openmm.LangevinIntegrator(T, 1 / unit.picosecond, dt)

    platform = Platform.getPlatformByName('CPU')
    simulation = app.Simulation(topology, system, integrator, platform)
    simulation.context.setPositions(positions)
    simulation.minimizeEnergy()

    return simulation

cvs = (
    DihedralAngle((4, 6, 8, 14)),
    DihedralAngle((6, 8, 14, 16))
)

num_umbrellas = 5
start_phi = -pi/2
end_phi = -pi/4
start_psi = -pi/2
end_psi = -pi/4

centers = np.array(list(zip(np.linspace(start_phi,end_phi,num_umbrellas).tolist(),\
                   np.linspace(start_psi,end_psi,num_umbrellas).tolist())))

method = UmbrellaIntegration(cvs)
result = method.run(generate_simulation,timesteps=1e5,centers=centers,\
                    ksprings=100,hist_periods=50)
InnocentBug commented 2 years ago

Thanks for submitting this example.

This is something, that is probably best handled by the MD engines themselves. Maybe we integrate something at some point for these iterations, but PySAGES gives the control for longer simulations to the MD engines. So longer runs have to be terminated by the MD engine. HOOMD-blue offers the HOOMD_WALLTIME flag, which raises an exception of the runtime is exceeded. I am not sure if OpenMM offers something similar.

The long term plan is more to have the iteration flag you mentioned parallized over multiple GPUs.

InnocentBug commented 2 years ago

Also estimates of how long a given simulation will run are handle by the engines. HOOMD-blue prints this out by default, OpenMM has state reporter that can report the estimated run time.

InnocentBug commented 2 years ago

At some point, we also want to implement a functionality, that long runs can be checkpointed and restarted. I would say we postpone these great ideas until then.

InnocentBug commented 2 years ago

I just noticed, that you explictly request the CPU platform. Does that have an impact on your runtime experience?

InnocentBug commented 2 years ago

I took this issue as inspiration for a google colab that runs a Harmonic Bias simulation with OpenMM. It will soon be part of the PySAGES tutorials. Not that if you run it in CPU mode the actual simulation cell takes about 12 minutes, but using the GPU as accelerator the same cell takes less then 30 seconds. OpenMM seems to be very sensitive to using the GPU. (Other the HOOMD, which is more tolerant with CPU for small examples.

Hoomd-blue Harmonic Bias