[FEA]: Add more benchmarks for `thrust::transform`

NVIDIA / cccl

CUDA Core Compute Libraries

https://nvidia.github.io/cccl/

Other

1.3k stars 164 forks source link

[FEA]: Add more benchmarks for `thrust::transform` #2814

Open bernhardmgruber opened 1 week ago

bernhardmgruber commented 1 week ago

We currently use the BabelStream kernels as memory bound workloads and a fibonacci kernel (per thread: read random index [0;42], compute fibonacci number, store result). The latter kernel is compute and shows high thread divergence.

In order to better assess regressions, we should add more benchmarks covering:

compute bound, but convergent threads
overwriting the input sequence 1:1
reading from overlapping regions (e.g. something like adjacent difference, or a stencil operation)

bernhardmgruber commented 2 days ago

I think we should actually make them CUB benchmarks, since they should be included in continuous CUB benchmarking and tuning.

bernhardmgruber commented 2 days ago

We can probably postpone this, because thrust::transform seems to be performing well.