Open bernhardmgruber opened 1 week ago
I think we should actually make them CUB benchmarks, since they should be included in continuous CUB benchmarking and tuning.
We can probably postpone this, because thrust::transform
seems to be performing well.
We currently use the BabelStream kernels as memory bound workloads and a fibonacci kernel (per thread: read random index [0;42], compute fibonacci number, store result). The latter kernel is compute and shows high thread divergence.
In order to better assess regressions, we should add more benchmarks covering: