Ensure neighbor list is large enough on first step

peastman commented 9 years ago

I believe this fixes https://github.com/FoldingAtHome/openmm-core/issues/56. Please test this. I verified that similar changes in the master version fix the problem with John's Python script, but I'm not set up to be able to easily run Python scripts against the core21 code.

jchodera commented 9 years ago

Let me figure out how to test this. Can this be compiled (complete with Python wrappers) just like OpenMM?

peastman commented 9 years ago

Yes.

jchodera commented 9 years ago

Sorry for the delay. It was easiest to build a conda package, which you can get here: https://binstar.org/choderalab/openmm-core21/files

I've run the following test code on a {system,integrator,state}.xml from FAH project 10492, the very large project with ~277K atoms:

#!/bin/env python

from simtk import openmm, unit
import time

state = openmm.XmlSerializer.deserialize(open('state0.xml').read())
system = openmm.XmlSerializer.deserialize(open('system.xml').read())
integrator = openmm.XmlSerializer.deserialize(open('integrator.xml').read())

platform = openmm.Platform.getPlatformByName('OpenCL')
context = openmm.Context(system, integrator, platform)
context.setPositions(state.getPositions())

print "Stepping..."

initial_time = time.time()
integrator.step(1)
final_time = time.time()
print "Initial step: %.3f s" % (final_time - initial_time)

initial_time = time.time()
integrator.step(1)
final_time = time.time()
print "Next step: %.3f s" % (final_time - initial_time)

On a GTX-680, the peastman/openmm rebuild branch (on Linux), running this a few times gives:

[chodera@gpu-1-6 fah-10492-test]$ python test.py 
Stepping...
Initial step: 7.879 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.771 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.736 s
Next step: 0.001 s

but I think this code reports both the kernel compilation overhead and the actual kernel execution times. Is there a way to separate these two? Or do we just need to try this on a win machine to see if it triggers a GPU reset?

For reference, here is the current OpenMM 6.3 timings on the same machine executed a few times:

[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.814 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.740 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.746 s
Next step: 0.001 s

peastman commented 9 years ago

The best thing is to run it on Windows. If this were the CUDA platform, you could use nvprof or nvvp to get the kernel execution time, but that doesn't support OpenCL anymore.

By the way, instead of calling step(1), I'd suggest called getState(getForces=True). That makes it return the forces back to the host, which ensures all the kernels have really executed. step(1) can potentially just add the kernels to the queue and return immediately.

jchodera commented 9 years ago

No windows access here, but I will see if one of the industrious slack users can help. Sorry this is taking so long to test!

peastman commented 9 years ago

I'll go ahead and test this. It will take me some work, but not as much as it will take you if you don't even have a machine to test it on.

jchodera commented 9 years ago

Thanks! I got as far as setting up a build chain to push the package to binstar here: https://binstar.org/choderalab/openmm-core21/files

but ran into the fact that your PR seems to have broken builds on Windows: https://github.com/pandegroup/openmm/pull/1082

I can't test that which does not compile. :)

jchodera commented 9 years ago

I can have the toolchain build a conda package for win once that linker error is fixed.

jchodera commented 9 years ago

Here's the appveyor output: https://ci.appveyor.com/project/jchodera/conda-recipes/build/job/k7scso93qnlsu7ma

peastman commented 9 years ago

It was nothing to do with that commit. There was a file that got missed when the post-6.2 changes were ported over from the main repository. I'm not used to Tortoise Git, and it turns out it handles added files differently from the standard command line client. So it was just treating that file as an unversioned file and not telling me it hadn't been checked in.

This seems to work correctly, though if you're able to check it too, that would be good.

jchodera commented 9 years ago

Thanks! Will see if I can have one of the slack users test too.

jchodera commented 9 years ago

This once again petered out on the conda package build on appveyor: https://ci.appveyor.com/project/jchodera/conda-recipes/build/1.0.10/job/dc8jpar2e3di9v6a#L5937

[ 62%] Built target TestReferenceVirtualSites
[ 62%] Generating ../src/OpenCLKernelSources.cpp, ../src/OpenCLKernelSources.h
Scanning dependencies of target OpenMMOpenCL
Error: dependent 'C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86_64\OpenCL.lib' does not exist.
jom: C:\Python34_64\conda-bld\work\build\CMakeFiles\Makefile2 [platforms\opencl\sharedTarget\CMakeFiles\OpenMMOpenCL.dir\all] Error 2
jom: C:\Python34_64\conda-bld\work\build\Makefile [all] Error 2

C:\Python34_64\conda-bld\work\build>if errorlevel 1 exit 1

Are we requiring the OpenCL libraries be provided in a different way here than they are for the OpenMM 6.3 release or openmm-dev?

peastman commented 9 years ago

No, it's exactly the same. Set it with OPENCL_LIBRARY, which apparently is set to "C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86_64\OpenCL.lib". Do you not have it installed there?

jchodera commented 9 years ago

@rmcgibbo, did you have to do weird things to the omnia appveyor build infrastructure to get these libraries installed?

rmcgibbo commented 9 years ago

I didn't do anything with OpenCL.

jchodera commented 9 years ago

Huh.

-- Found OPENCL: C:/Program Files (x86)/AMD APP SDK/2.9-1/lib/x86_64/OpenCL.lib

But then:

Error: dependent 'C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86_64\OpenCL.lib' does not exist.

So confused. https://ci.appveyor.com/project/jchodera/conda-recipes/build/1.0.10/job/dc8jpar2e3di9v6a#L297

rmcgibbo commented 9 years ago

I have my windows machine up. Do you want me to build this PR?

jchodera commented 9 years ago

Please! I am not sure what is going on with the automated builds.

jchodera commented 9 years ago

I'll point you to the test script and XML files.

rmcgibbo commented 9 years ago

I compiled this PR (Windows 10, VC2010, python3.4-x64, nvidia gtx 660, cuda 7.0, compiling and linking against the nvidia opencl headers/library) and ran the files that @jchodera sent me via email. The output was

PS C:\users\rmcgibbo\projects\fah-openmm\build> python .\test.py
Loading files...
Gathering timing data...
Initial getState time:    937.550 ms
Subsequent getState time: 375.022 ms

rmcgibbo commented 9 years ago

Rerunnng, it's a little faster, but I haven't followed this thread, so I don't know the significance:

PS C:\users\rmcgibbo\projects\fah-openmm\build> python .\test.py
Loading files...
Gathering timing data...
Initial getState time:    687.536 ms
Subsequent getState time: 359.396 ms
PS C:\users\rmcgibbo\projects\fah-openmm\build> python .\test.py
Loading files...
Gathering timing data...
Initial getState time:    687.538 ms
Subsequent getState time: 359.395 ms

jchodera commented 9 years ago

This is great! We're mainly hoping to get this below the 2 s windows TDR timeout, since exceeding that triggers a GPU reset and causes core 21 to abort and return a bad work unit.

This result suggests that @peastman has bought us a big swath of wiggle room for larger projects! Thanks!

peastman commented 9 years ago

And that's the total clock time. The nonbonded kernel is only a small fraction of that.

It's saying I can't merge this because I don't have write access to this repository. @jcoffland can you fix that?

jchodera commented 9 years ago

You should be able to merge now, @peastman

FoldingAtHome / openmm

Ensure neighbor list is large enough on first step #3