Implementation for the SYCL/oneAPI backend based on cuda with some inspiration taken from alpaka.
The SYCL implementation can be compiled with dpcpp or with clang++. The latter is the default since it supports the CUDA backend. To compile with dpcpp set USE_SYCL_ONEAPI=1. In this case TBB from oneAPI is used and it's not cloned as an external.
make environment
source env.sh
make -j `nproc` sycl
The device(s) can be chosen at runtime with --device:
cpu: the CPU can be selected, but it hangs after a couple of events (currently under investigation with Intel because there seems to be a conflict between TBB of the framework and of the runtime openCL)
gpu: targets all the GPUs
level_zero: for Intel GPUs with the level_zero backend
cuda: for NVIDIA GPUs (at the moment that only one that is consistent with the result)
hip: for AMD GPUs (doesn't compile, there is a bug)
Some changes (e.g. different kernels on CPU and GPU, shared variables...) are due to bugs in SYCL and can be reverted when those will be solved (marked with SYCL_BUG_).
NOTE: the line 155 of src/sycl/plugin-PixelVertexFinding/gpuClusterTracksIterative.h has been commented due to a bug in the compiler, so gpuClusterTracksIterative cannot be used
Regarding performance, a comparison on a NVIDIA GPU with native CUDA has been carried out, showing that there is still a lot of work to do to reach the performance of native CUDA.
More details on what has been changed and on the performance are available here
Implementation for the SYCL/oneAPI backend based on
cuda
with some inspiration taken fromalpaka
.The SYCL implementation can be compiled with
dpcpp
or withclang++
. The latter is the default since it supports the CUDA backend. To compile withdpcpp
setUSE_SYCL_ONEAPI=1
. In this case TBB from oneAPI is used and it's not cloned as an external.The device(s) can be chosen at runtime with
--device
:cpu
: the CPU can be selected, but it hangs after a couple of events (currently under investigation with Intel because there seems to be a conflict between TBB of the framework and of the runtime openCL)gpu
: targets all the GPUslevel_zero
: for Intel GPUs with the level_zero backendcuda
: for NVIDIA GPUs (at the moment that only one that is consistent with the result)hip
: for AMD GPUs (doesn't compile, there is a bug)Some changes (e.g. different kernels on CPU and GPU, shared variables...) are due to bugs in SYCL and can be reverted when those will be solved (marked with
SYCL_BUG_
).NOTE: the line 155 of
src/sycl/plugin-PixelVertexFinding/gpuClusterTracksIterative.h
has been commented due to a bug in the compiler, sogpuClusterTracksIterative
cannot be usedRegarding performance, a comparison on a NVIDIA GPU with native CUDA has been carried out, showing that there is still a lot of work to do to reach the performance of native CUDA.
More details on what has been changed and on the performance are available here