TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
158 stars 10 forks source link

feat: Add a Load data tile device function from global memory to register. #64

Closed KuangjuX closed 4 months ago

KuangjuX commented 4 months ago

This is a simple implementation of loading from global memory to registers.