This PR adds the Thrust implementation.
The build script handles both Thrust proper (Nvidia's implementation) and rocThrust (AMD's implementation).
As Thrust doesn't cover device selection/synchronisation related APIs, this PR makes use of a small amount of macros for the respective HIP/CUDA calls.
The implementation has been tested on ROCm 4.5.0 on Radeon VII and CUDA 11.3 on Titan X (Pascal) with comparable performance to native HIP and CUDA, respectively.
Finally, this PR also includes updates to the CI for the latest versions of ROCm and CUDA.
This PR adds the Thrust implementation. The build script handles both Thrust proper (Nvidia's implementation) and rocThrust (AMD's implementation).
As Thrust doesn't cover device selection/synchronisation related APIs, this PR makes use of a small amount of macros for the respective HIP/CUDA calls.
The implementation has been tested on ROCm 4.5.0 on Radeon VII and CUDA 11.3 on Titan X (Pascal) with comparable performance to native HIP and CUDA, respectively.
Finally, this PR also includes updates to the CI for the latest versions of ROCm and CUDA.