TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
158 stars 10 forks source link

Provide a complete GEMM example. #66

Closed haruhi55 closed 2 months ago

haruhi55 commented 4 months ago

Provide a complete GEMM example in the example directory and review entire the code structures.

KuangjuX commented 4 months ago

Has this issue already been completed?

haruhi55 commented 4 months ago

Not yet. A complete GEMM leverages GPU's three level of memory hierarchy and opens the entire control structure.