Closed VidithM closed 2 weeks ago
One important note:
I don't think the current structure of CUDA/template/GB_cuda_shfl_down.cuh
is right: it includes both the shfl_down methods for a uint64 as well as an arbitrary Z_TYPE. However, the Z_TYPE methods are only relevant/usable for reduce family JIT kernels (for example, the GB_ADD macro is undefined for other families that need shfl_down, such as select). If the entire file is included outside of the usable context (i.e. in a select kernel), the JIT kernel will not compile.
I suggest that the integer shfl_down functions are in a separate file from the Z_TYPE shfl_downs. Right now, I am working around this in the select_bitmap kernel by simply pasting in the uint64 shfl_down manually.
That's a good idea. I will split the file
I've split the shfl_down methods into 2 files.