NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.18k stars 144 forks source link

[FEA]: Provide interface and implementation for in-place `thrust::copy_if` #1799

Open elstehle opened 4 months ago

elstehle commented 4 months ago

Is this a duplicate?

Area

Thrust

Is your feature request related to a problem? Please describe.

Our thrust::copy_if interfaces do not support in-place stream compaction. The performance measurements in https://github.com/NVIDIA/cccl/pull/1782#issuecomment-2133809012 and https://github.com/NVIDIA/cccl/pull/1782#issuecomment-2133809012 have highlighted that there is notable performance downside associated with adding precautions (i.e., memory barriers) required to support in-place stream compaction in thrust::copy_if. As a result we want to provide in-place stream compaction via its own interface using a specialized code path.

Describe the solution you'd like

To avoid unwanted ambiguity between existing overloads and the in-place version, we probably want to expose that functionality via a new name (e.g., thrust::inplace_copy_if) rather than another set of overloads of thrust::copy_if.

Describe alternatives you've considered

No response

Additional context

No response

leofang commented 4 months ago

Silly question: if the two versions share the same function signatures, would it work to add inplace as a Boolean template parameter (and set it to false for backward compatibility)?

miscco commented 4 months ago

I am strongly against inplace_copy_if, because there is already an algorithm that does exactly that which is remove_if

lilohuang commented 4 months ago

I agree with @miscco that thrust::remove_if should suffice for CCCL users who require in-place modifications.

@elstehle If the CCCL library intends to introduce an in-place version of thrust::copy_if, I recommend naming it thrust::inplace_copy_if to make its functionality explicit. Alternatively, an overloaded C++ function that allows thrust::copy_if to operate without an output array, indicating an in-place operation, would also be a suitable approach.

For the version of thrust::copy_if that involves an output array, CCCL should continue using the existing non-in-place algorithm, even if the output array is the same as the input. This strategy will minimize any impact on existing applications.