TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
157 stars 10 forks source link

fix(cell): Re-implement shared tile iterator and fixed all the unittests. #159

Closed haruhi55 closed 6 days ago

haruhi55 commented 1 week ago
  1. Re-implement the shared memory tile iterator.
  2. Ensure all unit tests pass.