TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
159 stars 10 forks source link

Ensure consistency in the use of swizzled shared memory layout #38

Closed haruhi55 closed 3 months ago