Idein / qmkl6

BLAS library for VideoCore VI QPU (Raspberry Pi 4)
BSD 3-Clause "New" or "Revised" License
66 stars 7 forks source link

Add BLAS-like somatcopy kernel #9

Closed Terminus-IMRC closed 3 years ago

Terminus-IMRC commented 3 years ago

This pull request adds the somatcopy (out-of-place matrix copy and transposition) kernel, which is classified as a BLAS-like extension by Intel MKL. Currently, the number of rows and columns is limited to a multiple of four for transpose operations.

Minor refactors for other portions of this software are also included in this pull request.