TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
148 stars 10 forks source link

refactor(cell): Refactor shared to register loading using ldmatrix. #73

Closed haruhi55 closed 3 months ago

haruhi55 commented 3 months ago

Refactor the implementation of loading a shared memory tile into registers.

  1. xToXLoader/XToXStorer: These are the highest-level interfaces that expose only very logical concepts, such as Row-major or Column-major Tiles on a specific memory hierarchy.
  2. xToXLoaderImpl/XToXStorerImpl: These implement tile transfer between memory hierarchies. They are specialized based on the specific instruction used and the source and/or destination layout.
  3. xxStoreBase/xxLoadBase: These are thin wrappers for specific instructions to transfer a single BaseTile.