NVIDIA / cccl

CUDA Core Compute Libraries
Other
1.09k stars 127 forks source link

cub::BlockReduce can't be instantiated with BLOCK_DIM_X==1 #905

Open krzikalla opened 1 year ago

krzikalla commented 1 year ago
cub::BlockReduce<int, 1> reducer;

doesn't compile.

gevtushenko commented 1 year ago

Hello, @krzikalla! Thank you for reporting this. We've recently fixed similar issue for cub::WarpReduce. As I can see, the issue happens only when there is only a single thread in the thread block. In other words, the following versions get compiled successfully:

cub::BlockReduce<int, 1, cub::BlockReduceAlgorithm::BLOCK_REDUCE_RAKING, 2> reducer1;
cub::BlockReduce<int, 1, cub::BlockReduceAlgorithm::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY, 2> reducer2;
cub::BlockReduce<int, 1, cub::BlockReduceAlgorithm::BLOCK_REDUCE_WARP_REDUCTIONS, 2> reducer3;

Is there any change you can use cub::WarpReduce instead of cub::BlockReduce as a workaround for the mentioned case?