TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
159 stars 10 forks source link

(feat): Add a straightforward implementation for tile iterator. #50

Closed haruhi55 closed 5 months ago

haruhi55 commented 5 months ago

resolve https://github.com/TiledTensor/TiledCUDA/issues/49

  1. This PR adds implementations for these two lines: https://github.com/haruhi55/TiledCUDA/blob/b31db2aa1420b595f4ac01a792c714cd81053d1e/tests/cpp/cell/test_gemm.cu#L74-L75
  2. You can find potential uses of a shared memory tile iterator in the unit tests.
  3. The current unit tests are not sufficiently meaningful. I plan to add more stringent unit tests to ensure correctness once load/store operations are implemented.
  4. Improve code organizations and interfaces for copy tile from shared memory to register. I plan to add implementations for it in the next PR.