TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
157 stars 10 forks source link

fix(unittest): fix the GEMM unittest. #113

Closed haruhi55 closed 3 months ago

haruhi55 commented 3 months ago

The GEMM unit test in the master branch fails to compile after we refactored the global-to-shared loader/store to use a 16x16 BaseTile, making it be able to be compatible with shared memory swizzling.

The current implementation does not support storing floating-point numbers from shared to global memory.

To fix this, this PR modify the GEMM unit test to store GEMM's output directly from the register to global memory.