aboria / Aboria

Enables computations over a set of particles in N-dimensional space
https://aboria.github.io/Aboria
Other
105 stars 30 forks source link

example code with CUDA? #27

Closed Char-Aznable closed 6 years ago

Char-Aznable commented 6 years ago

Hi, does this library work with CUDA? Can you provide an example of setting up a simulation and compile it to run on GPU?

martinjrobins commented 6 years ago

The CUDA support is still work in progress, so I haven't documented it yet. I have been using this file for testing: https://github.com/martinjrobins/Aboria/blob/master/tests/md_level1.h You can use Aboria's CMake infrastructure to try and compile this on your system, but note that for the latest master commits you will need CUDA 9, as you need to compile Aboria with c++14 support.

I'll update this ticket as the CUDA support progresses.

martinjrobins commented 6 years ago

The det branch now has some examples of CUDA code, plus compiling instructions. You can either build the documentation yourself (turn on the Aboria_BUILD_DOCUMENTATION cmake variable and then make aboria-html), or look at the tests/doc_getting_started.h and tests/parallel.h files.

A warning: nvcc still gives loads of warnings which I haven't fixed yet. And I've only checked that the examples compile, not that it runs OK. The only machine I have with CUDA 9 support does not have a working run-time CUDA driver. Anyone willing to give it a try and provide feedback would be very much appreciated.

Char-Aznable commented 6 years ago

Which part of the computation is being parallelized? Is it a more like a force decomposition or domain decomposition?

martinjrobins commented 6 years ago

The bulk of the parallelization within Aboria is the creation and updating of the spatial data structures (cell-lists/kd-trees/oct-trees). The symbolic api uses a parallel loop (openMP, not CUDA) over the particles. In terms of user-code this is also the easiest way to implement parallel code: Aboria provides you with a particle container, so the easiest parallel loop to write is a parallel loop over the particles. All Aboria's neighbour searching can be done within the loop (or within CUDA kernels)

Char-Aznable commented 6 years ago

OK. So basically the CUDA code now only parallelizes updating spatial data structures resulting from particles moving but not any force/energy evaluation?

martinjrobins commented 6 years ago

The core of Aboria gives you a collection of data structures and iterators to enable you to write force/energy evaluations over particles. These data structures and iterators can now work within the Thrust framework. So you can imagine using a thrust::for_each algorithm over your particles, and within that using Aboria's neighbour search to evaluate forces on each particle.

Note there is also a higher-level symbolic interface for Aboria that basically just "does the loops for you". This only uses OpenMP, not CUDA. And for the foreseeable future it won't support CUDA, since it relies on Boost::Proto, which doesn't support CUDA

Char-Aznable commented 6 years ago

Got it. Thanks!