TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
148
stars
10
forks
source link
Make register to shared storer support for swizzled shared memory #133
Closed
haruhi55 closed 1 month ago