Closed brycelelbach closed 1 month ago
/ok to test
If/when we add support for int128,
cuda.popc
andint.bit_count
probably won't work with it; they're implemented in terms of LLVM's ctpop, which I believe only supports ints up to 64bits.
Indeed, in NVVM it's only supported up to 64 bits:
llvm.ctpop
: Supported fori8
,i16
,i32
,i64
, and vectors of these types.
(from https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#bit-manipulations-intrinsics)
Though in general in LLVM it's supported for any bit width:
You can use
llvm.ctpop
on any integer bit width, or on any vector with integer elements. Not all targets support all bit widths or vector types, however.
(from https://releases.llvm.org/7.0.1/docs/LangRef.html#llvm-ctpop-intrinsic)
The formatting issue was such a tiny nit I've just committed a fix (just adding another whitespace).
/ok to test
I've just re-targeted this to main
.
Python's
int
has a method for getting the count of bits that are set:int.bit_count()
. We already have this functionality in Numba CUDA:cuda.popc(int)
. But, it would be nice to just makeint.bit_count
work; it's potentially more familiar to Python programmers and has a clearer name.I also added some additional tests for
cuda.popc
to make sure it works for ints smaller than 32bits.If/when we add support for int128,
cuda.popc
andint.bit_count
probably won't work with it; they're implemented in terms of LLVM's ctpop, which I believe only supports ints up to 64bits.