TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
MIT License
148 stars 10 forks source link

Enhance the unit tests for storing Tensor Core's WMMA output tile. #57

Closed haruhi55 closed 1 month ago

haruhi55 commented 4 months ago

The current unit tests only verify the use of a single warp to store the results of the ldmatrix.

However, since the outputs of the WMMA instruction have varying data types that occupy different widths, the store operation needs to be aware of the output's data type to enable vectorized storing.