NVIDIA / numba-cuda

BSD 2-Clause "Simplified" License
36 stars 8 forks source link

Add device-side support for `int.bit_count` #52

Closed brycelelbach closed 1 month ago

brycelelbach commented 1 month ago

Python's int has a method for getting the count of bits that are set: int.bit_count(). We already have this functionality in Numba CUDA: cuda.popc(int). But, it would be nice to just make int.bit_count work; it's potentially more familiar to Python programmers and has a clearer name.

I also added some additional tests for cuda.popc to make sure it works for ints smaller than 32bits.

If/when we add support for int128, cuda.popc and int.bit_count probably won't work with it; they're implemented in terms of LLVM's ctpop, which I believe only supports ints up to 64bits.

copy-pr-bot[bot] commented 1 month ago

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gmarkall commented 1 month ago

/ok to test

gmarkall commented 1 month ago

If/when we add support for int128, cuda.popc and int.bit_count probably won't work with it; they're implemented in terms of LLVM's ctpop, which I believe only supports ints up to 64bits.

Indeed, in NVVM it's only supported up to 64 bits:

llvm.ctpop: Supported for i8, i16, i32, i64, and vectors of these types.

(from https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#bit-manipulations-intrinsics)

Though in general in LLVM it's supported for any bit width:

You can use llvm.ctpop on any integer bit width, or on any vector with integer elements. Not all targets support all bit widths or vector types, however.

(from https://releases.llvm.org/7.0.1/docs/LangRef.html#llvm-ctpop-intrinsic)

gmarkall commented 1 month ago

The formatting issue was such a tiny nit I've just committed a fix (just adding another whitespace).

gmarkall commented 1 month ago

/ok to test

gmarkall commented 1 month ago

I've just re-targeted this to main.