NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.28k stars 163 forks source link

[FEA]: Non-deterministic DeviceReduce #265

Open gonzalobg opened 1 year ago

gonzalobg commented 1 year ago

Is this a duplicate?

Area

CUB

Is your feature request related to a problem? Please describe.

The std::transform_reduce algorithm does not require determinism, but an implementation on top of CUB is "pseudo-deterministic" (run-to-run deterministic on a given device, for a given cub version).

This prevents optimizing DeviceReduce with algorithms that do not uphold this.

Describe the solution you'd like

Add an option to DeviceReduce to control whether run-to-run determinism is enabled/disabled (defaulting it to enabled).

Describe alternatives you've considered

Not using CUB / Thrust.

Additional context

No response

gevtushenko commented 1 year ago

Related to the following issue https://github.com/NVIDIA/cccl/issues/886