You can find potential uses of a shared memory tile iterator in the unit tests.
The current unit tests are not sufficiently meaningful. I plan to add more stringent unit tests to ensure correctness once load/store operations are implemented.
Improve code organizations and interfaces for copy tile from shared memory to register. I plan to add implementations for it in the next PR.
resolve https://github.com/TiledTensor/TiledCUDA/issues/49