Closed ajz34 closed 2 months ago
Additional note
This program is linked with OpenBLAS v0.3.27, instead of reference CBLAS. However, CBLAS implementation of cblas_dgbmv
should be virtually the same.
The CBLAS implementation (wrapper) may be functionally the same (I have not checked it yet), but the underlying implementation of GBMV is certainly different
You're using the LAPACKE way of storing a banded matrix - take a rectangle of matrix entries within the band, columns in columns, diagonals in rows, and store that rectangle either in column- or row- major order.
Unfortunately and very surprisingly, CBLAS uses a different storage:
I don't know how this historically happened (I'd be curious to know) but that's how it is.
@vladimir-ch Seems like that. In other words, dimension that represents (kl+ku+1) is always contiguous in memory, in current implementation of CBLAS convention.
Also, it seems that current implementation of CBLAS convention is different to LAPACKE convention.
For example of lapacke_dgbsv_work.c, when input matrix is row-major, it simply make explicit memory clone to col-major, then export output result by memory to row-major again. In this way, results of LAPACK_ROW_MAJOR
can be the same to LAPACK_COL_MAJOR
, which seems to be more appearant to me.
In most cases, LAPACK involves BLAS-3 to perform $O(N^3)$ computations; so (perhaps) explicit matrix transposition of $O(N^2)$ memory-bounded is acceptable. However, for BLAS-2 (such as DGBMV), the function itself is $O(N^2)$ memory-bouonded computation; explicit matrix transposition not only increases memory cost, but also expected to double computation time. So it is also reasonable for CBLAS not performing explicit memory transposition.
So after your reply, I thought I'm not sure whether this should be considered as bug, or a CBLAS-specific convention 😂
disclamer: I'm working on some codes that wrap BLAS functions, so I found this convention inconsistency (or perhaps bug). For the tasks I'm actually dealing with, I only need dense/symm/triangular BLAS-3 (or its extensions); so this inconsistency actually does not affect my work. But it's still surprising to find that some CBLAS functions does not work as the same way as I used to believe.
The CBLAS convention for band matrices is well documented, see
so it's definitely not a CBLAS bug.
The only LAPACKE documentation I've found doesn't explicitly mention band matrices, so the reference implementation is practically the documentation. And LAPACKE_dgb_trans
clearly assumes the input array to be (ku+kl+1)*n for both column- and row-major layouts and it does a plain transpose of the input rectangular array, copying only the referenced elements. All callers of this routine check that ldab < n
which further supports the expected shape of the input array and that LAPACKE band matrix layout differs from CBLAS. Even if it's unintended, it cannot be changed without breaking all code that has adapted to this layout.
@vladimir-ch Oh this is very clear! I used to learn BLAS functions only from doxygen html, but haven't noticed documents from BLAS technical forum. This issue discussion and BLAS forum documents really helped a lot, and made clarify to understanding to the design conceptions of APIs.
To my understanding to these documents, CBLAS convention prefers layout that contiguous data in full matrix is also contiguous in banded (or triangular packed) matrix, which is more cache friendly to computation. As a side effect, parameter descriptions (in function signature) in legacy BLAS (col-major) functions may not be directly applicable to CBLAS (row-major allowed) in banded or triangular packed cases. This design has already been addressed by discussions of BLAS forum; it is reasonable and not an unintentional bug.
Description
cblas_dgbmv
function (banded y = alpha op(A) x + beta y) seems not showing consistent results between column-major and row-major.I guess that simply changing transposition flag in the following code (for rea/double cases) may work for general multiplication, but not suitable for banded multiplication: https://github.com/Reference-LAPACK/lapack/blob/694e3375b0e61c70c98ef9cb46013170cd99f2a5/CBLAS/src/cblas_dgbmv.c#L62-L64
I feel that, given that CBLAS has been proposed as very basic linear library since last century, possibility of a bug is quite small (or either
cblas_?gbmv
functions are seldomly used by other projects). So I'm not sure whether this is a bug, or I thought in wrong way. But I can't convince myself on this problem.Minimal working example
In the following code, I can confirm that by changing matrix A from column-major to row-major,
cblas_dgemv
gives the correct result. While forcblas_dgbmv
, results of column-major and row-major does not seems to be the same, which is probably not desired by the design of CBLAS.Checklist