Closed peastman closed 9 years ago
Let me figure out how to test this. Can this be compiled (complete with Python wrappers) just like OpenMM?
Yes.
Sorry for the delay. It was easiest to build a conda package, which you can get here: https://binstar.org/choderalab/openmm-core21/files
I've run the following test code on a {system,integrator,state}.xml
from FAH project 10492, the very large project with ~277K atoms:
#!/bin/env python
from simtk import openmm, unit
import time
state = openmm.XmlSerializer.deserialize(open('state0.xml').read())
system = openmm.XmlSerializer.deserialize(open('system.xml').read())
integrator = openmm.XmlSerializer.deserialize(open('integrator.xml').read())
platform = openmm.Platform.getPlatformByName('OpenCL')
context = openmm.Context(system, integrator, platform)
context.setPositions(state.getPositions())
print "Stepping..."
initial_time = time.time()
integrator.step(1)
final_time = time.time()
print "Initial step: %.3f s" % (final_time - initial_time)
initial_time = time.time()
integrator.step(1)
final_time = time.time()
print "Next step: %.3f s" % (final_time - initial_time)
On a GTX-680, the peastman/openmm
rebuild
branch (on Linux), running this a few times gives:
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 7.879 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.771 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.736 s
Next step: 0.001 s
but I think this code reports both the kernel compilation overhead and the actual kernel execution times. Is there a way to separate these two? Or do we just need to try this on a win machine to see if it triggers a GPU reset?
For reference, here is the current OpenMM 6.3 timings on the same machine executed a few times:
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.814 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.740 s
Next step: 0.001 s
[chodera@gpu-1-6 fah-10492-test]$ python test.py
Stepping...
Initial step: 3.746 s
Next step: 0.001 s
The best thing is to run it on Windows. If this were the CUDA platform, you could use nvprof or nvvp to get the kernel execution time, but that doesn't support OpenCL anymore.
By the way, instead of calling step(1), I'd suggest called getState(getForces=True). That makes it return the forces back to the host, which ensures all the kernels have really executed. step(1) can potentially just add the kernels to the queue and return immediately.
No windows access here, but I will see if one of the industrious slack users can help. Sorry this is taking so long to test!
I'll go ahead and test this. It will take me some work, but not as much as it will take you if you don't even have a machine to test it on.
Thanks! I got as far as setting up a build chain to push the package to binstar here: https://binstar.org/choderalab/openmm-core21/files
but ran into the fact that your PR seems to have broken builds on Windows: https://github.com/pandegroup/openmm/pull/1082
I can't test that which does not compile. :)
I can have the toolchain build a conda package for win once that linker error is fixed.
Here's the appveyor output: https://ci.appveyor.com/project/jchodera/conda-recipes/build/job/k7scso93qnlsu7ma
It was nothing to do with that commit. There was a file that got missed when the post-6.2 changes were ported over from the main repository. I'm not used to Tortoise Git, and it turns out it handles added files differently from the standard command line client. So it was just treating that file as an unversioned file and not telling me it hadn't been checked in.
This seems to work correctly, though if you're able to check it too, that would be good.
Thanks! Will see if I can have one of the slack users test too.
This once again petered out on the conda package build on appveyor: https://ci.appveyor.com/project/jchodera/conda-recipes/build/1.0.10/job/dc8jpar2e3di9v6a#L5937
[ 62%] Built target TestReferenceVirtualSites
[ 62%] Generating ../src/OpenCLKernelSources.cpp, ../src/OpenCLKernelSources.h
Scanning dependencies of target OpenMMOpenCL
Error: dependent 'C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86_64\OpenCL.lib' does not exist.
jom: C:\Python34_64\conda-bld\work\build\CMakeFiles\Makefile2 [platforms\opencl\sharedTarget\CMakeFiles\OpenMMOpenCL.dir\all] Error 2
jom: C:\Python34_64\conda-bld\work\build\Makefile [all] Error 2
C:\Python34_64\conda-bld\work\build>if errorlevel 1 exit 1
Are we requiring the OpenCL libraries be provided in a different way here than they are for the OpenMM 6.3 release or openmm-dev
?
No, it's exactly the same. Set it with OPENCL_LIBRARY, which apparently is set to "C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86_64\OpenCL.lib". Do you not have it installed there?
@rmcgibbo, did you have to do weird things to the omnia appveyor build infrastructure to get these libraries installed?
I didn't do anything with OpenCL.
Huh.
-- Found OPENCL: C:/Program Files (x86)/AMD APP SDK/2.9-1/lib/x86_64/OpenCL.lib
But then:
Error: dependent 'C:\Program Files (x86)\AMD APP SDK\2.9-1\lib\x86_64\OpenCL.lib' does not exist.
So confused. https://ci.appveyor.com/project/jchodera/conda-recipes/build/1.0.10/job/dc8jpar2e3di9v6a#L297
I have my windows machine up. Do you want me to build this PR?
Please! I am not sure what is going on with the automated builds.
I'll point you to the test script and XML files.
I compiled this PR (Windows 10, VC2010, python3.4-x64, nvidia gtx 660, cuda 7.0, compiling and linking against the nvidia opencl headers/library) and ran the files that @jchodera sent me via email. The output was
PS C:\users\rmcgibbo\projects\fah-openmm\build> python .\test.py
Loading files...
Gathering timing data...
Initial getState time: 937.550 ms
Subsequent getState time: 375.022 ms
Rerunnng, it's a little faster, but I haven't followed this thread, so I don't know the significance:
PS C:\users\rmcgibbo\projects\fah-openmm\build> python .\test.py
Loading files...
Gathering timing data...
Initial getState time: 687.536 ms
Subsequent getState time: 359.396 ms
PS C:\users\rmcgibbo\projects\fah-openmm\build> python .\test.py
Loading files...
Gathering timing data...
Initial getState time: 687.538 ms
Subsequent getState time: 359.395 ms
This is great! We're mainly hoping to get this below the 2 s windows TDR timeout, since exceeding that triggers a GPU reset and causes core 21 to abort and return a bad work unit.
This result suggests that @peastman has bought us a big swath of wiggle room for larger projects! Thanks!
And that's the total clock time. The nonbonded kernel is only a small fraction of that.
It's saying I can't merge this because I don't have write access to this repository. @jcoffland can you fix that?
You should be able to merge now, @peastman
I believe this fixes https://github.com/FoldingAtHome/openmm-core/issues/56. Please test this. I verified that similar changes in the master version fix the problem with John's Python script, but I'm not set up to be able to easily run Python scripts against the core21 code.