The original stbmv kernel performs single-precision triangular band matrix-vector multiplication.
If the width of the band is one, then the matrix is a diagonal matrix and the values in the input vector is multiplied by the corresponding value of the diagonal in the matrix.
In addition, only the diagonal portions of the matrix are stored in the memory.
That is to say, stbmv with a diagonal matrix is equivalent to the element-wise product of two vectors.
This pull request adds the kernel that only supports diagonal matrices for this purpose.
Though Intel MKL offers the operation as vsMul, most other BLAS libraries do not.
So it should be an acceptable way (for testing in particular) to serve the operation through stbmv, which is supported by most libraries.
The implementation of the QPU kernel is based on saxpy.
The original
stbmv
kernel performs single-precision triangular band matrix-vector multiplication. If the width of the band is one, then the matrix is a diagonal matrix and the values in the input vector is multiplied by the corresponding value of the diagonal in the matrix. In addition, only the diagonal portions of the matrix are stored in the memory. That is to say,stbmv
with a diagonal matrix is equivalent to the element-wise product of two vectors. This pull request adds the kernel that only supports diagonal matrices for this purpose.Though Intel MKL offers the operation as
vsMul
, most other BLAS libraries do not. So it should be an acceptable way (for testing in particular) to serve the operation throughstbmv
, which is supported by most libraries.The implementation of the QPU kernel is based on
saxpy
.