TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
157 stars 10 forks source link

Clean up legacy code. #142

Closed haruhi55 closed 2 weeks ago

haruhi55 commented 2 months ago

These old codes directly leverage cute's tiled copy to implement tile transfer. Try to re-organize them.

macro kernel for copy:

  1. https://github.com/TiledTensor/TiledCUDA/blob/master/include/cell/copy/dyn_copy.hpp
  2. https://github.com/TiledTensor/TiledCUDA/blob/master/include/cell/copy/static_copy.hpp

kernels:

  1. https://github.com/TiledTensor/TiledCUDA/blob/master/src/kernels/b2b_gemm.cu
  2. https://github.com/TiledTensor/TiledCUDA/blob/master/src/kernels/bmm.cu
  3. https://github.com/TiledTensor/TiledCUDA/blob/master/src/kernels/lstm_cell.cu