eyalroz / cuda-kat

CUDA kernel author's tools
BSD 3-Clause "New" or "Revised" License
104 stars 8 forks source link

Support the "SIMD"-like intrinsics #81

Open eyalroz opened 4 years ago

eyalroz commented 4 years ago

CUDA offers many functions:

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SIMD.html

for working with multiple 1-byte and 2-byte values packed into the native 4-byte integers.

We should offer both explicit access to these, which would be better structured and not a heap of idiosyncratic names (perhaps via the kat::array type? some other way?)

We should also check our existing code, to see when specializations are in order which would ensure we benefit from these instructions (e.g. in sequence operations or collaboration primitives).