bheisler / RustaCUDA

Rusty wrapper for the CUDA Driver API
Apache License 2.0
765 stars 58 forks source link

Add tuning option for shared memory size #61

Open LutzCle opened 2 years ago

LutzCle commented 2 years ago

Pascal and newer devices support a shared memory size larger than 48 KiB per thread group. This is an opt-in feature that was introduced in CUDA 9.0 by specifying the desired size using a launch function attribute.

The attributes necessary to opt-in are:

This PR adds tuning support to RustaCUDA as in this example:

// Get the maximum shared memory size
let max_shared_mem_bytes = device.get_attribute(DeviceAttribute::MaxSharedMemoryPerBlockOptin)? as u32;

// Set the function attribute
let function_name = std::ffi::CString::new(...).unwrap();
let mut function = module.get_function(&function_name)?;
function.set_max_dynamic_shared_size_bytes(max_shared_mem_bytes)?;

// Launch the kernel
unsafe { launch!( function<<<grid, block, max_shared_mem_bytes, stream>>>()).unwrap() };

For more information, see the CUDA documentation and the Pascal tuning guide.