Is your feature request related to a problem? Please describe.
The std::transform_reduce algorithm does not require determinism, but an implementation on top of CUB is "pseudo-deterministic" (run-to-run deterministic on a given device, for a given cub version).
This prevents optimizing DeviceReduce with algorithms that do not uphold this.
Describe the solution you'd like
Add an option to DeviceReduce to control whether run-to-run determinism is enabled/disabled (defaulting it to enabled).
Is this a duplicate?
Area
CUB
Is your feature request related to a problem? Please describe.
The
std::transform_reduce
algorithm does not require determinism, but an implementation on top of CUB is "pseudo-deterministic" (run-to-run deterministic on a given device, for a given cub version).This prevents optimizing DeviceReduce with algorithms that do not uphold this.
Describe the solution you'd like
Add an option to DeviceReduce to control whether run-to-run determinism is enabled/disabled (defaulting it to enabled).
Describe alternatives you've considered
Not using CUB / Thrust.
Additional context
No response