This PR uses cub select in order to create a single pass scan shuffle. I have also added gather and scatter operations, they are not actually shuffling anything but testing throughput for random scatter and gather operations as an upper bound for the scan-based algorithm.
This PR uses cub select in order to create a single pass scan shuffle. I have also added gather and scatter operations, they are not actually shuffling anything but testing throughput for random scatter and gather operations as an upper bound for the scan-based algorithm.