eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs
BSD 3-Clause "New" or "Revised" License
768 stars 79 forks source link

Support use_max_active_blocks_per_multiprocessor() on a config builder #580

Closed eyalroz closed 7 months ago

eyalroz commented 7 months ago

For a kernel_t, we can obtain the maximum active blocks per SM; let's add a config builder method which uses this value to set the grid dims.

eyalroz commented 7 months ago

Actually, we already have this with saturate_with_active_blocks().