Adds a wrapper method to cuFuncSetAttribute(), which is helpful for experimenting with some low-level device settings. In modern NVIDIA GPUs, the shared memory and L1 cache share the same hardware and you can control how much is alloted for each type. This method allows me to experiment with devoting all of the memory to L1 cache since the GMM kernel doesn't use shared memory.
Adds a wrapper method to
cuFuncSetAttribute()
, which is helpful for experimenting with some low-level device settings. In modern NVIDIA GPUs, the shared memory and L1 cache share the same hardware and you can control how much is alloted for each type. This method allows me to experiment with devoting all of the memory to L1 cache since the GMM kernel doesn't use shared memory.