eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs
BSD 3-Clause "New" or "Revised" License
790 stars 80 forks source link

Support setting kernel block cluster dimensions #484

Closed eyalroz closed 7 months ago

eyalroz commented 1 year ago

With the Hopper architecture, NVIDIA has introduced "clusters" of blocks which can use each other's shared memory. The clustering can be set either using a __cluster_dims__(1,2,3) qualifier in the kernel's signature, or at run-time. We need to support the run-time setting within our launch_configuration_t class and in the launch config builder mechanism.

eyalroz commented 7 months ago

Fixed by addressing #564 .