This pull request adds the somatcopy (out-of-place matrix copy and transposition) kernel, which is classified as a BLAS-like extension by Intel MKL.
Currently, the number of rows and columns is limited to a multiple of four for transpose operations.
Minor refactors for other portions of this software are also included in this pull request.
This pull request adds the somatcopy (out-of-place matrix copy and transposition) kernel, which is classified as a BLAS-like extension by Intel MKL. Currently, the number of rows and columns is limited to a multiple of four for transpose operations.
Minor refactors for other portions of this software are also included in this pull request.