how to equilibrate before NCMC computation?

sstolzenberg commented 8 years ago

Hi there,

How can I properly equilibrate a system prior to an NCMC computation?

Trying to make this work for a simple test system (annihilating a proton in a water box), I came across some odd results in the computed NCMC potential energy differences (see test.py.txt and test.log.txt, I can send you more input files if necessary):

When performing a "delete" NCMC computation after equilibration (via openmm.LangevinIntegrator), I obtain much larger potential energy differences (187kT in test.log) compared to the same "delete" NCMC computation without any prior openmm-equilibration (-22kT in test.log). The potential energy right after the equilibration and right before the NCMC computation are identical, and I have tried different equilibrated starting structures (in.pdb), and different equilibration/NCMC times.

Is there, e.g., a specific NCMC equilibration integrator I should be using instead of openmm.LangevinIntegrator?

Thank you for your help, Best, Sebastian

jchodera commented 8 years ago

Go ahead and post those input files and I'll try to hook you up with a working example!

sstolzenberg commented 8 years ago

Hi,

OK, the solution for the above problem is to simply equilibrate my H-system for simulation times much longer than the inverse of my heat bath friction coefficient (1/ps) - I am able to do this now having access to cuda7.5.

Here is my updated test.py script including the input files: test.zip

The reason I still post these files (and kindly ask you for help) is the following: In test.py, you will see that I am running one equilibration and two non-equilibrium stages (alchemical transformations for NCMC) - "equ", "nonequ1", "nonequ2", each 1000 integration steps long. On average over 100 runs, each of these stages has the following runtime (with standard deviations):

dt_equ: 0.07+-0.03 min. dt_nonequ1: 13.5+-4.2 min. dt_nonequ2: 13.8+-5.4 min.

So for some reason, I can get the non-equilibrium transformations not even comparably as fast as the corresponding equilibration stage. Can you qualitatively reproduce the above runtimes (differences)? Is there a tweak I can use to speed up the non-equilibrium transformations, or is this an hardware issue?

For what it's worth, I am using Debian 3.16.7 perses-0.1-py2.7 openmm 6.3.1 openmmtools 0.7.5 python 2.7.1

and a "GeForce GTX 980" GPU card

Thanks a lot, Sebastian

jchodera commented 8 years ago

Thanks! Benchmarking this now to see if I can reproduce your timings.

I would expect the NCMC should be slower since it has to (1) recode most of the NonbondedForces as CustomNonbondedForces, slowing things down, and (2) use a CustomIntegrator. We unroll the NCMC steps in the CustomIntegrator kernel, which may be inefficient---I'm exploring some ways to speed this up. I think we can probably optimize this considerably.

jchodera commented 8 years ago

On a GTX-680, I get

[chodera@gpu-1-7 test]$ python test.py
#"Time (ps)" "Temperature (K)"
2.0 282.690386764
total potential energy after equilibration: -16739.4710206 kJ/mol

doing NCMC with    prior equilibration
deleting H...

doing NCMC without prior equilibration
deleting H...
runtimes (equ,nonequ1,nonequ2) in min.: 0.794938 38.472065 0.455481

This is pretty confusing to me, since the only difference in the last two should be the choice of starting positions! I'm trying to figure out what is causing this...

sstolzenberg commented 8 years ago

For what it's worth, I can confirm this odd behavior (not as drastic though) for runs on both GeForce GTX 780 and 980 cards:

dtequ dtnonequ1 dtnonequ2 0.107036 4.616891 0.934710 0.119366 4.148975 1.068160 0.126915 4.127439 1.068445 0.106440 3.828613 1.052439 0.084712 3.923797 1.108901 0.064234 3.746215 1.073810 0.100949 3.698790 1.050620 0.127872 3.551751 0.952409 0.127397 3.529105 1.020273 0.085606 3.516918 1.079289 0.116085 3.398323 0.995480 0.082598 3.290072 0.981817 0.096335 3.413157 1.101247 0.067541 3.292237 1.005146 0.083328 3.175025 1.048694

jchodera commented 8 years ago

The first time you run a new CUDA kernel, it has to compile it. I wonder if we're seeing the overhead from that, though it's usually just a few seconds...

jchodera commented 8 years ago

This seems to be the case. If you change ncmc_nsteps to 100 and then repeat both calls to doNCMC, you'll see that the first one has the huge kernel compilation overhead while the subsequent calls avoid this:

[chodera@gpu-2-16 test]$ python test.py
#"Time (ps)" "Temperature (K)"
2.0 268.881007473
total potential energy after equilibration: -16716.6672555 kJ/mol

doing NCMC with    prior equilibration
deleting H...
148.892848969

doing NCMC without prior equilibration
deleting H...
3.51380300522

doing NCMC with    prior equilibration
deleting H...
3.57928395271

doing NCMC without prior equilibration
deleting H...
3.55953884125
runtimes (equ,nonequ1,nonequ2) in min.: 0.017088 0.059655 0.059326

I think manually unrolling the loop in the CustomIntegrator might be causing the long kernel compilation times. I'll see if there is a way to speed this up.

pgrinaway commented 8 years ago

I think manually unrolling the loop in the CustomIntegrator might be causing the long kernel compilation times. I'll see if there is a way to speed this up.

I'm pretty much positive this is the case. I had experimented with up to 4000 ncmc_steps, and the compilation of that CustomIntegrator took a very long time. Now that loops are available in the CustomIntegrator syntax, we should be able to speed this up by not unrolling, no? Basically, we can:

#do VV step
self.beginWhileBlock('i<n_steps')
self.addComputeGlobal('lambda','lambda+1/n_steps')
#do VV step
self.addComputeGlobal('i','i+1')
self.endBlock()

That snippet is a bit rough since I just wrote it from memory of the docs, but maybe something like that would work?

jchodera commented 8 years ago

Thanks! I'll fix this tonight unless you get to it first.

sstolzenberg commented 8 years ago

Thanks a lot!

sstolzenberg commented 8 years ago

wait, is this simple to implement? Maybe as an exercise, I can try to implement this myself, so you can check whether I was doing it right?

jchodera commented 8 years ago

I've just updated perses-dev with the loop scheme suggested by @pgrinaway. Tests still pass, so I hope I haven't broken anything.

Things are much faster now. I had already run the test with 1000 NCMC steps once, where there was some kernel compilation overhead (~30 sec) the first time, but subsequent runs seem to use the cached kernel:

[chodera@gpu-2-8 test]$ python test.py
#"Time (ps)" "Temperature (K)"
2.0 273.719295663
total potential energy after equilibration: -16859.4563622 kJ/mol

doing NCMC with    prior equilibration
deleting H...
9.55056190491

doing NCMC without prior equilibration
deleting H...
9.59217095375

doing NCMC with    prior equilibration
deleting H...
9.56448888779

doing NCMC without prior equilibration
deleting H...
9.57315087318
runtimes (equ,nonequ1,nonequ2) in min.: 0.020515 0.159408 0.159553

We'll continue to make speed improvements, but go ahead and close this isssue @sstolzenberg if this resolves your problem. If you still have an issue with equilibration, let me know!

sstolzenberg commented 8 years ago

Hi there,

Thanks a lot for the update! Unfortunately, my checks with the "H-in-water" system (see the two files in Archiv.zip to be included in the test folder posted above) show me that even the new NCMC-code in perses is about five times slower than our simple (python-level, based on force field parameter scaling) NCMC implementation:

runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.071 0.212 0.197 1.283 1.382 0.267

I understand that contrary to our code, perses-NCMC includes soft-core potentials, and I wonder whether its current implementation via Custom(Non)BondedForces is the reason for its suboptimal runtimes - could you please check whether I am mistaken?

Thanks again, Sebastian

jchodera commented 8 years ago

I'll check the runtimes shortly, but it is very possible that the softcore implementation and the CustomIntegrator slows things down considerably. However, if you are inserting Lennard-Jones particles in water for nontrivial systems, I imagine the 5x slowdown will still be much better than the terrible statistics from growing molecules in water without softcore.

jchodera commented 8 years ago

Here are the timings I get from just running the code twice on a GTX-680:

runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.358 0.729 0.162 1.249 1.251 0.401
...
runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.020 0.158 0.158 1.210 1.212 0.265

However, I notice two things about your code:

Your simple NCMC doesn't actually compute the NCMC acceptance probability, which requires computing initial and final total energy. Adding this computation doesn't seem to slow things down much though:

runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.020 0.157 0.158 1.213 1.211 0.264

Your simple code is actually swapping NonbondedForce parameters for two sets of particles, which may actually work OK. I was worried you were creating/destroying particles without softcore, which would have been problematic. So this approach may be fine---we are testing this too in another codebase involving converting waters into ions.
Your NCMC is not symmetric---you always change the lambda value before taking a step. You either need to use a symmetric protocol (where you take a step or change lambda at both the beginning and the end) or you need to randomize the order in which propagation/perturbation happens for the NCMC switching scheme. When you go down to very short NCMC switching times, the error can get very large.

sstolzenberg commented 8 years ago

OK, thanks a lot for now!

choderalab / perses

how to equilibrate before NCMC computation? #192