[SYCL] Add SYCL implementation

Implementation for the SYCL/oneAPI backend based on cuda with some inspiration taken from alpaka.

The SYCL implementation can be compiled with dpcpp or with clang++. The latter is the default since it supports the CUDA backend. To compile with dpcpp set USE_SYCL_ONEAPI=1. In this case TBB from oneAPI is used and it's not cloned as an external.

make environment
source env.sh
make -j `nproc` sycl

The device(s) can be chosen at runtime with --device:

cpu: the CPU can be selected, but it hangs after a couple of events (currently under investigation with Intel because there seems to be a conflict between TBB of the framework and of the runtime openCL)
gpu: targets all the GPUs
level_zero: for Intel GPUs with the level_zero backend
cuda: for NVIDIA GPUs (at the moment that only one that is consistent with the result)
hip: for AMD GPUs (doesn't compile, there is a bug)

Some changes (e.g. different kernels on CPU and GPU, shared variables...) are due to bugs in SYCL and can be reverted when those will be solved (marked with SYCL_BUG_).

NOTE: the line 155 of src/sycl/plugin-PixelVertexFinding/gpuClusterTracksIterative.h has been commented due to a bug in the compiler, so gpuClusterTracksIterative cannot be used

Regarding performance, a comparison on a NVIDIA GPU with native CUDA has been carried out, showing that there is still a lot of work to do to reach the performance of native CUDA.

More details on what has been changed and on the performance are available here

cms-patatrack / pixeltrack-standalone

[SYCL] Add SYCL implementation #387