TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
148 stars 10 forks source link

Make register to shared storer support for swizzled shared memory #133

Closed haruhi55 closed 1 month ago