Umbrella Sampling with OpenMM: Excessive Runtime?

meschw04 commented 2 years ago

Hello! Thanks so much for the answers in issue #99, gonna go ahead and close that issue for the time being (I ended up using daiquiri with the OpenMM example, happy to share if that'd be helpful to you all). I also took @InnocentBug's suggestion to try umbrella sampling with the ADP example in OpenMM. I wrote the code shown below. My understanding is that this should run five umbrellas in OpenMM over 1e5 time steps (after an initial burn-in), then use WHAM to stitch these together to provide the A matrix. Looking at some other examples, the constants I set below in terms of the torsional angles and the umbrella k constant all seem reasonable. If I run just a single umbrella in OpenMM without using pysages (by adding, for eg, bias_torsion_phi = CustomTorsionForce("0.5*k_phi*dtheta^2; dtheta = min(tmp, 2*pi-tmp); tmp = abs(theta - phi)") ), and all the exact same code as below but without pysages, then the simulation completes in, like, 4 seconds.

I started the script below running on a single core yesterday morning, and it finished this morning. I'm confused as to what is taking such a high computational overhead. I have tried changing the k values, start/end locations of (psi,phi), num_umbrellas, etc., then I run for an hour before killing it. It really shouldn't take an hour, right? It should take, what, ~30 sec?

On the implementation side of things (and I'd be happy to help with this), I think it would be really nice to have one or some of the following:

TQDM on the for loop for the set of umbrellas to give an estimate of how long it will run for (see https://github.com/SSAGESLabs/PySAGES/blob/192e7f9af6fdb50329d2e8fea095537b39a1fc12/pysages/methods/umbrella_integration.py#L97)
A cutoff flag for time running if not converged, or...
Based on the motion of the system, some kind of feedback to say "you should decrease your time step" or "you should increase/decrease K"
A warning when running WHAM along the lines of "umbrellas don't overlap much at all, so WHAM (or MBAR?) isn't going to give good results on the FES"

Thanks so much! :smile:

from pysages.collective_variables import DihedralAngle
from pysages.methods import ABF, UmbrellaIntegration
import numpy as np
from pysages.utils import try_import
from openmm import *
from openmm.app import *

import numpy
import pysages

openmm = try_import("openmm", "simtk.openmm")
unit = try_import("openmm.unit", "simtk.unit")
app = try_import("openmm.app", "simtk.openmm.app")

pi = numpy.pi

def generate_simulation(**kwargs):
    pdb_filename = "alanine-dipeptide-explicit.pdb"
    T = 298.15 * unit.kelvin
    dt = 2.0 * unit.femtoseconds
    pdb = app.PDBFile(pdb_filename)

    ff = app.ForceField("amber99sb.xml", "tip3p.xml")
    cutoff_distance = 1.0 * unit.nanometer
    topology = pdb.topology
    system = ff.createSystem(
        topology, constraints = app.HBonds, nonbondedMethod = app.NoCutoff,
        nonbondedCutoff = cutoff_distance
    )

    positions = pdb.getPositions(asNumpy = True)

    integrator = openmm.LangevinIntegrator(T, 1 / unit.picosecond, dt)

    platform = Platform.getPlatformByName('CPU')
    simulation = app.Simulation(topology, system, integrator, platform)
    simulation.context.setPositions(positions)
    simulation.minimizeEnergy()

    return simulation

cvs = (
    DihedralAngle((4, 6, 8, 14)),
    DihedralAngle((6, 8, 14, 16))
)

num_umbrellas = 5
start_phi = -pi/2
end_phi = -pi/4
start_psi = -pi/2
end_psi = -pi/4

centers = np.array(list(zip(np.linspace(start_phi,end_phi,num_umbrellas).tolist(),\
                   np.linspace(start_psi,end_psi,num_umbrellas).tolist())))

method = UmbrellaIntegration(cvs)
result = method.run(generate_simulation,timesteps=1e5,centers=centers,\
                    ksprings=100,hist_periods=50)