Closed mhoemmen closed 3 months ago
We should probably make that correction against the current draft instead of P1673. https://eel.is/c++draft/linalg#algs.blas3.rankk The numbers are off.
@dalg24 https://github.com/kokkos/stdBLAS/issues/274 appears to look at the current draft. I'll revise my comments there.
In 3.2, replace "possibly-addable
<decltype(x), decltype(y), decltype(z)>
" with "possibly-addable
<decltype(y), decltype(y), decltype(z)>
"
In 4.2, replace "addable
(x, y, z)
" with "addable
(y, y, z)
"
In 5, replace "$O($ x.extent(0)
× A.extent(1)
$)$" with "$O($ A.extent(0)
× x.extent(0)
$)$"
In 3.4, replace "possibly-addable
<decltype(x), decltype(y), decltype(z)>
" with "possibly-addable
<decltype(y), decltype(y), decltype(z)>
"
In 4.3, replace "addable
(x, y, z)
" with "addable
(y, y, z)
"
In 5, replace "$O($ x.extent(0)
× A.extent(1)
$)$" with "$O($ A.extent(0)
× x.extent(0)
$)$" (the original is technically correct, since A is symmetric and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
In 3.4, replace "possibly-addable
<decltype(x), decltype(y), decltype(z)>
" with "possibly-addable
<decltype(y), decltype(y), decltype(z)>
"
In 4.3, replace "addable
(x, y, z)
" with "addable
(y, y, z)
"
In 5, replace "$O($ x.extent(0)
× A.extent(1)
$)$" with "$O($ A.extent(0)
× x.extent(0)
$)$" (the original is technically correct, since A is Hermitian and therefore square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
(The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here.)
In 7, replace "$O($ x.extent(0)
× A.extent(1)
$)$" with "$O($ A.extent(0)
× x.extent(0)
$)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
In 10, replace "$O($ y.extent(0)
× A.extent(1)
$)$" with "$O($ A.extent(0)
× y.extent(0)
$)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
In 13, replace "$O($ x.extent(0)
× A.extent(1)
$)$" with "$O($ A.extent(0)
× x.extent(0)
$)$" (the original is technically correct, since A is square, but changing this would improve consistency with [linalg.algs.blas2.gemv])
OK: [linalg.algs.blas3.xxmm], [linalg.algs.blas3.trmm]
(C is the symmetric or Hermitian matrix, not A. C needs to be square, not necessarily A.)
Remove 3.2, which currently says "compatible-static-extents
<decltype(A), decltype(A)>(0, 1)
is true
"
Remove 4.1, which currently says "A.extent(0)
equals A.extent(1)
"
In the Complexity clause 5, replace "$O( A.extent(0)
× A.extent(1)
x C.extent(0)
$)$" with "$O( C.extent(0)
× C.extent(1)
x C.extent(0)
$)$" (the current clause is correct, but less consistent with GEMM)
(C is the symmetric or Hermitian matrix, not A or B. C needs to be square, not necessarily A or B. A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).)
In 3.2, replace "possibly-addable
<decltype(A), decltype(B), decltype(C)>()
is true
" with "possibly-multipliable
<decltype(A), decltype(transposed(B)), decltype(C)>
is true
and possibly-multipliable
<decltype(B), decltype(transposed(A)), decltype(C)>
is true
"
In 3.3, replace "compatible-static-extents
<decltype(A), decltype(A)>(0, 1)
" with "compatible-static-extents
<decltype(C), decltype(C)>(0, 1)
"
In 4.1, replace "addable
(A, B, C)
is true
" with "multipliable
(A, transposed(B), C)
is true
and multipliable
(transposed(B), A, C)
is true
"
In 4.2, replace "A.extent(0)
equals A.extent(1)
" with "C.extent(0)
equals C.extent(1)
"
In the Complexity clause 5, replace "$O($ A.extent(0)
× A.extent(1)
× C.extent(0)
$)$" with "$O($ A.extent(0)
× A.extent(1)
× B.extent(1)
$)$"
(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)
A.extent(0)
× X.extent(1)
× X.extent(1)
$)$" with "$O($ A.extent(0)
× B.extent(0)
× B.extent(1)
$)$"(Complexity clause 14, for the right solve, is fine.)
A.extent(0)
× A.extent(1)
× B.extent(1)
$)$ with "$O($ B.extent(0)
× A.extent(0)
× A.extent(1)
$)$"@mhoemmen a few comments:
[linalg.algs.blas3.rankk]
...
- In the Complexity clause 5, replace "$O(
A.extent(0)
×A.extent(1)
xC.extent(0)
)" with "$O(C.extent(0)
×C.extent(1)
xC.extent(0)
)" (the current clause is correct, but less consistent with GEMM)
The change you propose is wrong as it converts to O(n^3) if we assume A(n,k), C(n,n). I suppose you mean "O(A.extent(0)
×A.extent(1)
xA.extent(0)
)" to be consistent with GEMM.
[linalg.algs.blas3.rank2k]
...
In 3.2, replace "
possibly-addable
<decltype(A), decltype(B), decltype(C)>()
istrue
" with "possibly-multipliable
<decltype(A), decltype(transposed(B)), decltype(C)>
istrue
andpossibly-multipliable
<decltype(B), decltype(transposed(A)), decltype(C)>
istrue
"In 4.1, replace "
addable
(A, B, C)
istrue
" with "multipliable
(A, transposed(B), C)
istrue
andmultipliable
(transposed(B), A, C)
istrue
"
The change proposed for 4.1 is wrong. It should read:
In 4.1, replace "addable
(A, B, C)
is true
" with "multipliable
(A, transposed(B), C)
is true
and ~multipliable
(transposed(B), A, C)
~ multipliable
(B, transposed(A), C)
is true
"
However, the second part of the proposed statements for 3.2 and 4.1 is superfluous as:
multipliable(A, transposed(B), C) && C is a square matrix => multipliable(B, transposed(A), C)
- In the Complexity clause 5, replace "O(
A.extent(0)
×A.extent(1)
×C.extent(0)
)" with "O(A.extent(0)
×A.extent(1)
×B.extent(1)
)"
This proposed change is wrong as it would convert to O(n k^2) if we assume A(n,k), B(n,k) and C(n,n).
The old statement is correct, however, it can be made consistent with GEMM using
"O(A.extent(0)
×A.extent(1)
×B.extent(0)
)"
[linalg.algs.blas3.trsm]
(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)
- In the Complexity clause 6 (for the left solve), replace "O(
A.extent(0)
×X.extent(1)
×X.extent(1)
)" with "O(A.extent(0)
×B.extent(0)
×B.extent(1)
)"
The suggested change is correct. But note that the original statement is wrong as X is not a square matrix and has the same size as B.
Finally one of the errors has not been reported here:
O(A.extent(0)×A.extent(1)×C.extent(0))
with O(A.extent(0)×A.extent(1)×C.extent(1))
O(A.extent(0)×A.extent(1)×C.extent(0))
with O(C.extent(0)×C.extent(1)×A.extent(1))
Oh, I made some typos there! Thanks for reviewing!
layout_transpose
Private members nested-mapping
and extents
_
should be in mapping
and not in the layout.
conjugated_accessor
Constructor from NestedAccessor
appears in the wording (para 3) but is missing from the class synopsis.
[linalg.conj.conjugatedaccessor] explicit(!is_convertible_v<OtherNestedAccessor, NestedAccessor>>)
has an extra closing >
.
The following function should return NestedAccessor
instead.
constexpr const Accessor& nested_accessor() const noexcept;
scaled_accessor
The nested-accessor
member of scaled_accessor
does not have a trailing underscore. This is inconsistent with other classes in [linalg].
hermitian_*
algorithms get a diagonal element with nonzero imaginary partDefine that for hermitian_*
algorithms, the imaginary elements on the diagonal aren't even accessed (which is what the BLAS Standard says). The reference BLAS implementation of ZHEMV uses dble(A(j,j))
, which means that it just takes the real part. We should also do that.
Extend [linalg.general] with diagonal access logic. Something like this: "For accesses of diagonal elements m[i, i]
, F will use the value real-if-needed
(m[i, i])
if the name of F starts with hermitian."
"These functions perform an updating Hermitian matrix-vector product, taking into account the Triangle parameter that applies to the Hermitian matrix A [linalg.general]."
These issues have been submitted as the following LWG / wording issues. Thanks all!
P1673: Corrections
BLAS 3
[linalg.algs.blas3.rankk]
C is the symmetric or Hermitian matrix, not A, so A doesn't have to be square. (C does need to be square, though.)
Replace (2.2) with "
compatible-static-extents
<decltype(C), decltype(C)>(0, 1)
istrue
"Replace (2.3) and (2.4) with
possibly-multipliable
<decltype(A), decltype(transposed(A)), C>
Replace (3.1) with "
C.extent(0)
equalsC.extent(1)
"Replace (3.2) and (3.3) with "
multipliable
(A, transposed(A), C)
"[linalg.algs.blas3.rank2k]
C is the symmetric or Hermitian matrix, not A or B, so A (or B) doesn't have to be square. (C does need to be square, though.) A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).
Replace (2.2) with "
possibly-multipliable
<decltype(A), decltype(transposed(B)), decltype(C)>
istrue
andpossibly-multipliable
<decltype(B), decltype(transposed(A)), decltype(C)>
istrue
"Replace (2.3) with "
compatible-static-extents
<decltype(C), decltype(C)>(0, 1)
istrue
"Replace (3.1) with "
multipliable
(A, transposed(B), C)
istrue
andmultipliable
(transposed(B), A, C)
istrue
"Replace (3.2) with "
C.extent(0)
equalsC.extent(1)
"