ACL_GEMM (1024 x 1024) execution time : 10ms
CSR_FLEX_GEMM (1024 x 1024) execution time : 60ms (best performance)
The difference between these two performances seems to be the tiling optimization technique.
To get better performance than ACL_GEMM, consider about tile-based SpMM or use other sparse format.
ACL_GEMM (1024 x 1024) execution time : 10ms CSR_FLEX_GEMM (1024 x 1024) execution time : 60ms (best performance)
The difference between these two performances seems to be the tiling optimization technique. To get better performance than ACL_GEMM, consider about tile-based SpMM or use other sparse format.