This is a benchmark for parallel computation of a "power method": stencil-based matrix-vector product, norm, scaling, and repeat that.
So far there are C++ implementations based on OpenMP, Kokkos, Sycl, MPI. Contributed implementations welcome.
This software uses the package cxxopts and mdspan.
You can take the easy way out: the CMake installation will fetch these packages. However, if you want to install them yourself:
.pc
files from cxxopts
to the PKG_CONFIG_PATH
mdspan
installation directory to the CMAKE_PREFIX_PATH
.Go into code/diff2d
. Calling make
without arguments tells you all available rules, and the available variants.
Drive the cmake installation with the makefile:
make cmake VARIANTS="kokkos sycl" ## or other variants
You can of course run cmake outside of make:
variant=span
mkdir build
ln cmake/CMakeLists.txt $variant
cmake -B build -S $variant -D VARIANT=$variant
You can let CMake fetch prerequisite packages or install them yourself; see above.
Go into code/diff2d
. Calling make
without arguments
tells you all the make rules. For compilation use make or cmake:
make compile VARIANTS="seq oned"
Set the variable TACC_MDSPAN_INC
to the location of the header files.
The following code variants are available:
oned
: traditional C-style OpenMP implementation.clps
: OpenMP with collapse(2)
directive.span
: iterating over a ranges::view::cartesian_product
.iota
: double loop over iota_view
s. range
: Using range execution policies. Not Working Yet!kokkos
: based on Kokkos.sycl
: using Sycl; only tested with Intel's Sycl, not with AdaptiveCPP or other implementation.dist
: MPI implementation through MPL.
Not yet compilable with CMake: kokkos, sycl, dist.
Run bin/oned
or any other variant. Commandline options:
-m 100 -n 200
: domain size;-t
: trace output;-i 5
: run only five iterations, not until full convergence which can take forever.Run ./compare_models.py
with arguments as below.
This is somewhat TACC-dependent;
for instance it interrogates SLRUM_CPUS_ON_NODE
to find the core count.
If you're not on a SLURM cluster, set this by hand.
Other options:
-c cpuname
: only for outputfile naming;-n 123456
: the -m -n
options above;-t
: iteration trace output;variant1,variant2,variant3
or all
to run specific codes.