Closed sstolzenberg closed 8 years ago
Go ahead and post those input files and I'll try to hook you up with a working example!
Hi,
OK, the solution for the above problem is to simply equilibrate my H-system for simulation times much longer than the inverse of my heat bath friction coefficient (1/ps) - I am able to do this now having access to cuda7.5.
Here is my updated test.py script including the input files: test.zip
The reason I still post these files (and kindly ask you for help) is the following: In test.py, you will see that I am running one equilibration and two non-equilibrium stages (alchemical transformations for NCMC) - "equ", "nonequ1", "nonequ2", each 1000 integration steps long. On average over 100 runs, each of these stages has the following runtime (with standard deviations):
dt_equ: 0.07+-0.03 min. dt_nonequ1: 13.5+-4.2 min. dt_nonequ2: 13.8+-5.4 min.
So for some reason, I can get the non-equilibrium transformations not even comparably as fast as the corresponding equilibration stage. Can you qualitatively reproduce the above runtimes (differences)? Is there a tweak I can use to speed up the non-equilibrium transformations, or is this an hardware issue?
For what it's worth, I am using Debian 3.16.7 perses-0.1-py2.7 openmm 6.3.1 openmmtools 0.7.5 python 2.7.1
and a "GeForce GTX 980" GPU card
Thanks a lot, Sebastian
Thanks! Benchmarking this now to see if I can reproduce your timings.
I would expect the NCMC should be slower since it has to (1) recode most of the NonbondedForces
as CustomNonbondedForces
, slowing things down, and (2) use a CustomIntegrator
. We unroll the NCMC steps in the CustomIntegrator
kernel, which may be inefficient---I'm exploring some ways to speed this up. I think we can probably optimize this considerably.
On a GTX-680, I get
[chodera@gpu-1-7 test]$ python test.py
#"Time (ps)" "Temperature (K)"
2.0 282.690386764
total potential energy after equilibration: -16739.4710206 kJ/mol
doing NCMC with prior equilibration
deleting H...
doing NCMC without prior equilibration
deleting H...
runtimes (equ,nonequ1,nonequ2) in min.: 0.794938 38.472065 0.455481
This is pretty confusing to me, since the only difference in the last two should be the choice of starting positions! I'm trying to figure out what is causing this...
For what it's worth, I can confirm this odd behavior (not as drastic though) for runs on both GeForce GTX 780 and 980 cards:
dtequ dtnonequ1 dtnonequ2 0.107036 4.616891 0.934710 0.119366 4.148975 1.068160 0.126915 4.127439 1.068445 0.106440 3.828613 1.052439 0.084712 3.923797 1.108901 0.064234 3.746215 1.073810 0.100949 3.698790 1.050620 0.127872 3.551751 0.952409 0.127397 3.529105 1.020273 0.085606 3.516918 1.079289 0.116085 3.398323 0.995480 0.082598 3.290072 0.981817 0.096335 3.413157 1.101247 0.067541 3.292237 1.005146 0.083328 3.175025 1.048694
The first time you run a new CUDA kernel, it has to compile it. I wonder if we're seeing the overhead from that, though it's usually just a few seconds...
This seems to be the case. If you change ncmc_nsteps
to 100 and then repeat both calls to doNCMC
, you'll see that the first one has the huge kernel compilation overhead while the subsequent calls avoid this:
[chodera@gpu-2-16 test]$ python test.py
#"Time (ps)" "Temperature (K)"
2.0 268.881007473
total potential energy after equilibration: -16716.6672555 kJ/mol
doing NCMC with prior equilibration
deleting H...
148.892848969
doing NCMC without prior equilibration
deleting H...
3.51380300522
doing NCMC with prior equilibration
deleting H...
3.57928395271
doing NCMC without prior equilibration
deleting H...
3.55953884125
runtimes (equ,nonequ1,nonequ2) in min.: 0.017088 0.059655 0.059326
I think manually unrolling the loop in the CustomIntegrator
might be causing the long kernel compilation times. I'll see if there is a way to speed this up.
I think manually unrolling the loop in the CustomIntegrator might be causing the long kernel compilation times. I'll see if there is a way to speed this up.
I'm pretty much positive this is the case. I had experimented with up to 4000 ncmc_steps
, and the compilation of that CustomIntegrator
took a very long time. Now that loops are available in the CustomIntegrator
syntax, we should be able to speed this up by not unrolling, no? Basically, we can:
#do VV step
self.beginWhileBlock('i<n_steps')
self.addComputeGlobal('lambda','lambda+1/n_steps')
#do VV step
self.addComputeGlobal('i','i+1')
self.endBlock()
That snippet is a bit rough since I just wrote it from memory of the docs, but maybe something like that would work?
Thanks! I'll fix this tonight unless you get to it first.
Thanks a lot!
wait, is this simple to implement? Maybe as an exercise, I can try to implement this myself, so you can check whether I was doing it right?
I've just updated perses-dev
with the loop scheme suggested by @pgrinaway. Tests still pass, so I hope I haven't broken anything.
Things are much faster now. I had already run the test with 1000 NCMC steps once, where there was some kernel compilation overhead (~30 sec) the first time, but subsequent runs seem to use the cached kernel:
[chodera@gpu-2-8 test]$ python test.py
#"Time (ps)" "Temperature (K)"
2.0 273.719295663
total potential energy after equilibration: -16859.4563622 kJ/mol
doing NCMC with prior equilibration
deleting H...
9.55056190491
doing NCMC without prior equilibration
deleting H...
9.59217095375
doing NCMC with prior equilibration
deleting H...
9.56448888779
doing NCMC without prior equilibration
deleting H...
9.57315087318
runtimes (equ,nonequ1,nonequ2) in min.: 0.020515 0.159408 0.159553
We'll continue to make speed improvements, but go ahead and close this isssue @sstolzenberg if this resolves your problem. If you still have an issue with equilibration, let me know!
Hi there,
Thanks a lot for the update! Unfortunately, my checks with the "H-in-water" system (see the two files in Archiv.zip to be included in the test folder posted above) show me that even the new NCMC-code in perses is about five times slower than our simple (python-level, based on force field parameter scaling) NCMC implementation:
runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.071 0.212 0.197 1.283 1.382 0.267
I understand that contrary to our code, perses-NCMC includes soft-core potentials, and I wonder whether its current implementation via Custom(Non)BondedForces is the reason for its suboptimal runtimes - could you please check whether I am mistaken?
Thanks again, Sebastian
I'll check the runtimes shortly, but it is very possible that the softcore implementation and the CustomIntegrator slows things down considerably. However, if you are inserting Lennard-Jones particles in water for nontrivial systems, I imagine the 5x slowdown will still be much better than the terrible statistics from growing molecules in water without softcore.
Here are the timings I get from just running the code twice on a GTX-680:
runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.358 0.729 0.162 1.249 1.251 0.401
...
runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.020 0.158 0.158 1.210 1.212 0.265
However, I notice two things about your code:
runtimes (equ,nonequ1,nonequ2,nonequ1a,nonequ2a,dtime_simplenonequ) in min.: 0.020 0.157 0.158 1.213 1.211 0.264
NonbondedForce
parameters for two sets of particles, which may actually work OK. I was worried you were creating/destroying particles without softcore, which would have been problematic. So this approach may be fine---we are testing this too in another codebase involving converting waters into ions.OK, thanks a lot for now!
Hi there,
How can I properly equilibrate a system prior to an NCMC computation?
Trying to make this work for a simple test system (annihilating a proton in a water box), I came across some odd results in the computed NCMC potential energy differences (see test.py.txt and test.log.txt, I can send you more input files if necessary):
When performing a "delete" NCMC computation after equilibration (via openmm.LangevinIntegrator), I obtain much larger potential energy differences (187kT in test.log) compared to the same "delete" NCMC computation without any prior openmm-equilibration (-22kT in test.log). The potential energy right after the equilibration and right before the NCMC computation are identical, and I have tried different equilibrated starting structures (in.pdb), and different equilibration/NCMC times.
Is there, e.g., a specific NCMC equilibration integrator I should be using instead of openmm.LangevinIntegrator?
Thank you for your help, Best, Sebastian