Idein / qmkl6

BLAS library for VideoCore VI QPU (Raspberry Pi 4)
BSD 3-Clause "New" or "Revised" License
67 stars 7 forks source link

Add stbmv kernel (only supports diagonal matrices for now) #2

Closed Terminus-IMRC closed 3 years ago

Terminus-IMRC commented 3 years ago

The original stbmv kernel performs single-precision triangular band matrix-vector multiplication. If the width of the band is one, then the matrix is a diagonal matrix and the values in the input vector is multiplied by the corresponding value of the diagonal in the matrix. In addition, only the diagonal portions of the matrix are stored in the memory. That is to say, stbmv with a diagonal matrix is equivalent to the element-wise product of two vectors. This pull request adds the kernel that only supports diagonal matrices for this purpose.

Though Intel MKL offers the operation as vsMul, most other BLAS libraries do not. So it should be an acceptable way (for testing in particular) to serve the operation through stbmv, which is supported by most libraries.

The implementation of the QPU kernel is based on saxpy.