TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
158 stars 10 forks source link

update cultass to 3.5.0. #39

Closed haruhi55 closed 6 months ago

haruhi55 commented 6 months ago

resolve https://github.com/TiledTensor/TiledCUDA/issues/36 resolve https://github.com/TiledTensor/TiledCUDA/issues/37

  1. A simple refinement for the copy_2d_tile_s2r macro kernel:moving the implementation into a Functor rather than a function. This would allow for partial specialization based on the memory access instruction used.
  2. Update cutlass to v3.5.0. However, after updating to Cutlass 3.5.0, the template parameter for TiledMMA has changed. While all the kernels pass the correctness check, one side effect is that I am no longer able to fully understand the register usage.