TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
159 stars 10 forks source link

data transfer between shared memory and register. #28

Closed haruhi55 closed 6 months ago

haruhi55 commented 6 months ago

resolve https://github.com/TiledTensor/TiledCUDA/issues/27 resolve https://github.com/TiledTensor/TiledCUDA/issues/5 and add cpp unitest automatically in cmake.