The current unit tests only verify the use of a single warp to store the results of the ldmatrix.
However, since the outputs of the WMMA instruction have varying data types that occupy different widths, the store operation needs to be aware of the output's data type to enable vectorized storing.
The current unit tests only verify the use of a single warp to store the results of the
ldmatrix
.However, since the outputs of the WMMA instruction have varying data types that occupy different widths, the store operation needs to be aware of the output's data type to enable vectorized storing.