TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
158 stars 10 forks source link

Enhancing shared memory access for 2D warp organization #44

Closed haruhi55 closed 5 months ago

haruhi55 commented 6 months ago

Support that in the 2D grid organization of warps and shared memory data tiles, warps within the same row/column load data tiles located in the same row/column.