I have a little confusion about the calculation formula for block_tile_j in cudaTensorCoreGemm.cu at line 230:
const unsigned int block_tile_j = (block_pos * BLOCK_COL_TILES) % N_TILES;
Theoretically, block_tile_i and block_tile_j are the row index and column index of tiles from matrix C or D, respectively.
Now that block_tile_i equals to block position(block_pos) multiply logical tile numbers per row in per thread block(BLOCK_ROW_TILES), then divide total tile numbers along N direction(N_TILES) and multiply logical tile numbers per column(BLOCK_COL_TILES) , i.e.
const unsigned int block_tile_i = ((block_pos * BLOCK_ROW_TILES) / N_TILES) * (BLOCK_COL_TILES);.
Therefore why block_tile_j is equal to (block_pos * BLOCK_COL_TILES) % N_TILES, not (block_pos * BLOCK_ROW_TILES) % N_TILES?
Look forward someone can resolve my confusion, Thanks.
I have a little confusion about the calculation formula for
block_tile_j
incudaTensorCoreGemm.cu
at line 230:const unsigned int block_tile_j = (block_pos * BLOCK_COL_TILES) % N_TILES;
Theoretically, block_tile_i and block_tile_j are the row index and column index of tiles from matrix C or D, respectively. Now that
block_tile_i
equals to block position(block_pos
) multiply logical tile numbers per row in per thread block(BLOCK_ROW_TILES
), then divide total tile numbers along N direction(N_TILES
) and multiply logical tile numbers per column(BLOCK_COL_TILES
) , i.e.const unsigned int block_tile_i = ((block_pos * BLOCK_ROW_TILES) / N_TILES) * (BLOCK_COL_TILES);
.Therefore why
block_tile_j
is equal to(block_pos * BLOCK_COL_TILES) % N_TILES
, not(block_pos * BLOCK_ROW_TILES) % N_TILES
?Look forward someone can resolve my confusion, Thanks.