TensorBFS / CuTropicalGEMM.jl

The fastest Tropical number matrix multiplication on GPU
MIT License
9 stars 0 forks source link

Optimizations for long and narrow matrices #18

Open ArrogantGao opened 1 year ago

ArrogantGao commented 1 year ago

In this package we are using a padding stragety to handle the boundary elements as that of GEMM, and the minimum size of the block is set as $64 \times 32$ and $32 \times 64$ for matrix A and matrix B. So that for narrow matrices which are widely used in tensor network calculations, there will be tons of useless calculations. For example, when the size of the matrices are $4 \times 4 \times 10^6$, what is actually calculated are matrices with size of $64 \times 32 \times 10^6$, and only $\frac{1}{128}$ of these calculation are useful.

Optimizations for such long and narrow matrices are needed.

ArrogantGao commented 1 year ago

Developing under a new branch https://github.com/TensorBFS/CuTropicalGEMM.jl/tree/narrow_matrices.