This repository provides multiple parallel n-body simulation algorithms, implemented in portable ISO C++ that runs on multi-core CPUs and GPUs:
all-pair
, parallelized over bodies.all-pairs-collapsed
, parallelized over force pairs.octree
algorithm: requires parallel forward progress.bvh
) algorithm: requires weakly parallel forward progress.Pre-requisites: docker
and HPCCM:
$ pip install hpccm
Run samples as follows:
# Options
# ./ci/run_docker <toolchain> <algorithm> <workload case> <dim> <precision> <bodies> <steps>
# Example: nvc++ gpu compiler, octree algorithm, galaxy simulation, 3D, double precision:
$ ./ci/run_docker nvgpu octree galaxy 3 double
# Build, but not run:
$ BUILD_ONLY=1 ./ci/run_docker nvgpu octree galaxy 3 double
# Run assuming binary is built:
$ RUN_ONLY=1 ./ci/run_docker nvgpu octree galaxy 3 double
To reproduce without a container, a properly set up environment is required, in which case the ./ci/run
script can be used instead.
Following options available:
acpp
(AdaptiveCpp), gcc
(Intel TBB), clang
(Intel TBB), amdclang
.nvgpu
(nvc++ -stdpar=gpu
), nvcpu
(nvc++ -stdpar=cpu
).dpc++
all-pairs
, all-pairs-collapsed
, octree
, bvh
.2
(2D), 3
(3D).float
, double
.galaxy
nasa
: loads data-set from file, requires using ./ci/run_docker thuering fetch
for set up.To run all benchmarks on a given systems, you can use ./ci/run_docker bench
.
MIT License, see LICENSE.
Thomas Lane Cassell, Tom Deakin, Aksel Alpay, Vincent Heuveline, and Gonzalo Brito Gadeschi. "Efficient Tree-Based Parallel Algorithms for N-Body Simulations Using C++ Standard Parallelism." In Workshop on Irregular Applications: Architectures and Algorithms Held in Conjunction with Supercomputing (P3HPC). IEEE, 2024.
When contributing code, you may format your contributions as follows:
$ ./ci/run_docker fmt
but doing this is not required.
The environment is made portable through mamba/conda.
This must be installed as a prerequisite, e.g., run the Miniforge installer from https://github.com/conda-forge/miniforge .
Then create the stdpar-nbody
environment:
$ mamba env create -f environment.yaml
Other things you might want:
Use make
to build the program.
This must be done within the mamba environment:
$ mamba activate stdpar-bh
The number of dimensions can be specified with D=<dim>
parameter to make
.
By default D=2
is used.
These are the available targets:
CPU
make gcc
make clang
make nvcpp
GPU
make gpu
to build for NVIDIA GPUs using nvc++
The output will be ./nbody_d<dim>_<target>
.
When running the nvcpp
version, it is recommended to use the following environment variables:
OMP_PLACES=cores OMP_PROC_BIND=close ./nbody_d2_nvcpp -s 5 -n 1000000
If you get an error about missing libraries then try running with the following environment variable:
LD_LIBRARY_PATH=${CONDA_PREFIX}/lib ./nbody_d2_clang -s 5 -n 1000000
Run Barnes-Hut with $\theta=0$ and compare with all pairs algorithm. Run 5 steps with 10 bodies. They should have the same output.
$ ./nbody_d2_gpu -s 5 -n 10 --print-state --theta 0
$ ./nbody_d2_gpu -s 5 -n 10 --print-state --algorithm all-pairs
Run a large Barnes-Hut simulation with 1,000,000 bodies:
$ ./nbody_d2_gpu -s 5 -n 1000000
Generate a similar image to the above GIF:
$ ./nbody_d2_gpu -s 1000 -n 10000 --save pos --workload galaxy
$ python3 scripts/plotter.py pos --galaxy --gif
To find other program arguments:
$ ./nbody_d2_gpu --help