TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
158
stars
10
forks
source link
refactor(cell): Refactor global to shared Tile transfer on basis of `BaseTile` #110
Closed
haruhi55 closed 3 months ago
GlobalToSharedLoader
to make it on basis of 16x16BaseTile
SharedToGlobalStorer
to make it on basis of 16x16BaseTile
BaseTile
.