In this package we are using a padding stragety to handle the boundary elements as that of GEMM, and the minimum size of the block is set as $64 \times 32$ and $32 \times 64$ for matrix A and matrix B.
So that for narrow matrices which are widely used in tensor network calculations, there will be tons of useless calculations.
For example, when the size of the matrices are $4 \times 4 \times 10^6$, what is actually calculated are matrices with size of $64 \times 32 \times 10^6$, and only $\frac{1}{128}$ of these calculation are useful.
Optimizations for such long and narrow matrices are needed.
In this package we are using a padding stragety to handle the boundary elements as that of GEMM, and the minimum size of the block is set as $64 \times 32$ and $32 \times 64$ for matrix
A
and matrixB
. So that for narrow matrices which are widely used in tensor network calculations, there will be tons of useless calculations. For example, when the size of the matrices are $4 \times 4 \times 10^6$, what is actually calculated are matrices with size of $64 \times 32 \times 10^6$, and only $\frac{1}{128}$ of these calculation are useful.Optimizations for such long and narrow matrices are needed.