Evaluate the performance on AMD cards for using the OpenCL 2.0 equivalents of warp vote and warp shuffle functions.
Based on the performance results, decide if there should be a separate kernel implementation for AMD GPUs (and OpenCL 2.0) and update the code as needed.
Evaluate the performance on AMD cards for using the OpenCL 2.0 equivalents of warp vote and warp shuffle functions.
Based on the performance results, decide if there should be a separate kernel implementation for AMD GPUs (and OpenCL 2.0) and update the code as needed.