TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
158
stars
10
forks
source link
feat: Add a Load data tile device function from global memory to register. #64
Closed
KuangjuX closed 4 months ago
This is a simple implementation of loading from global memory to registers.