ORNL / cpp-proposals-pub

Collaborating on papers for the ISO C++ committee - public repo
26 stars 27 forks source link

P1673: Corrections #464

Closed mhoemmen closed 3 months ago

mhoemmen commented 4 months ago

P1673: Corrections

BLAS 3

[linalg.algs.blas3.rankk]

C is the symmetric or Hermitian matrix, not A, so A doesn't have to be square. (C does need to be square, though.)

[linalg.algs.blas3.rank2k]

C is the symmetric or Hermitian matrix, not A or B, so A (or B) doesn't have to be square. (C does need to be square, though.) A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).

dalg24 commented 4 months ago

We should probably make that correction against the current draft instead of P1673. https://eel.is/c++draft/linalg#algs.blas3.rankk The numbers are off.

mhoemmen commented 4 months ago

@dalg24 https://github.com/kokkos/stdBLAS/issues/274 appears to look at the current draft. I'll revise my comments there.

mhoemmen commented 4 months ago

P1673: Corrections (updated)

BLAS 2

[linalg.algs.blas2.gemv]

[linalg.algs.blas2.symv]

[linalg.algs.blas2.hemv]

[linalg.algs.blas2.trmv]

(The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here.)

Rest of BLAS 2 (is fine)

BLAS 3

OK: [linalg.algs.blas3.xxmm], [linalg.algs.blas3.trmm]

[linalg.algs.blas3.rankk]

(C is the symmetric or Hermitian matrix, not A. C needs to be square, not necessarily A.)

[linalg.algs.blas3.rank2k]

(C is the symmetric or Hermitian matrix, not A or B. C needs to be square, not necessarily A or B. A, B, and C must have the same number of rows. A and B must have the same number of columns (they must both be N x K for some K not necessarily equal to N).)

[linalg.algs.blas3.trsm]

(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)

(Complexity clause 14, for the right solve, is fine.)

[linalg.algs.blas3.inplacetrsm]

rasolca commented 4 months ago

@mhoemmen a few comments:

[linalg.algs.blas3.rankk]

...

  • In the Complexity clause 5, replace "$O( A.extent(0) × A.extent(1) x C.extent(0) )" with "$O( C.extent(0) × C.extent(1) x C.extent(0) )" (the current clause is correct, but less consistent with GEMM)

The change you propose is wrong as it converts to O(n^3) if we assume A(n,k), C(n,n). I suppose you mean "O(A.extent(0)×A.extent(1)xA.extent(0))" to be consistent with GEMM.

[linalg.algs.blas3.rank2k]

...

  • In 3.2, replace "possibly-addable<decltype(A), decltype(B), decltype(C)>() is true" with "possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> is true and possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> is true"

  • In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and multipliable(transposed(B), A, C) is true"

The change proposed for 4.1 is wrong. It should read: In 4.1, replace "addable(A, B, C) is true" with "multipliable(A, transposed(B), C) is true and ~multipliable(transposed(B), A, C)~ multipliable(B, transposed(A), C) is true"

However, the second part of the proposed statements for 3.2 and 4.1 is superfluous as: multipliable(A, transposed(B), C) && C is a square matrix => multipliable(B, transposed(A), C)

  • In the Complexity clause 5, replace "O(A.extent(0)×A.extent(1)×C.extent(0))" with "O(A.extent(0)×A.extent(1)×B.extent(1))"

This proposed change is wrong as it would convert to O(n k^2) if we assume A(n,k), B(n,k) and C(n,n). The old statement is correct, however, it can be made consistent with GEMM using "O(A.extent(0)×A.extent(1)×B.extent(0))"

[linalg.algs.blas3.trsm]

(Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible.)

  • In the Complexity clause 6 (for the left solve), replace "O(A.extent(0)×X.extent(1)×X.extent(1))" with "O(A.extent(0)×B.extent(0)×B.extent(1))"

The suggested change is correct. But note that the original statement is wrong as X is not a square matrix and has the same size as B.

Finally one of the errors has not been reported here:

[linalg.algs.blas3.trmm]

mhoemmen commented 4 months ago

Oh, I made some typos there! Thanks for reviewing!

mhoemmen commented 4 months ago

More bugs, found by my colleague Ilya Burylov

layout_transpose

Private members nested-mapping and extents_ should be in mapping and not in the layout.

conjugated_accessor

Constructor from NestedAccessor appears in the wording (para 3) but is missing from the class synopsis.

[linalg.conj.conjugatedaccessor] explicit(!is_convertible_v<OtherNestedAccessor, NestedAccessor>>) has an extra closing >.

The following function should return NestedAccessor instead.

constexpr const Accessor& nested_accessor() const noexcept;

scaled_accessor

The nested-accessor member of scaled_accessor does not have a trailing underscore. This is inconsistent with other classes in [linalg].

mhoemmen commented 3 months ago

More bugs, found by my colleague Ilya Burylov

Specify what happens if hermitian_* algorithms get a diagonal element with nonzero imaginary part

Define that for hermitian_* algorithms, the imaginary elements on the diagonal aren't even accessed (which is what the BLAS Standard says). The reference BLAS implementation of ZHEMV uses dble(A(j,j)), which means that it just takes the real part. We should also do that.

  1. Extend [linalg.general] with diagonal access logic. Something like this: "For accesses of diagonal elements m[i, i], F will use the value real-if-needed(m[i, i]) if the name of F starts with hermitian."

  2. "These functions perform an updating Hermitian matrix-vector product, taking into account the Triangle parameter that applies to the Hermitian matrix A [linalg.general]."

mhoemmen commented 3 months ago

These issues have been submitted as the following LWG / wording issues. Thanks all!