TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
148 stars 10 forks source link

A buggy implementation of the TileIterator. #100

Closed haruhi55 closed 2 days ago

haruhi55 commented 3 months ago

Since we do not differentiate between GlobalTile and SharedTile, a TileIterator should be able to work with both types. However, the current implementation is tightly coupled with SharedTile, which is a bug as shown below

https://github.com/TiledTensor/TiledCUDA/blob/cb7a3361f70fb1edce2c9f858705629bc8a0f305/include/types/tile_iterator.hpp#L73

KuangjuX commented 3 months ago

Add a template parameter to represent different Tile types?

haruhi55 commented 3 months ago

A possible solution could be to have GlobalTile and SharedTile inherit from a common base class. Currently, since GlobalTile and SharedTile exhibit no differences in behavior, the computation results are correct.

I will carefully consider a suitable solution in next modifications.