P1673: LWG review 2023/11/07 (Kona)

[linalg.algs.blas3.trsm]

`triangular_matrix_matrix_left_solve`

[x] Does "valid but unspecified" differ from "unspecified"? Intent is that you can use the values; they aren't like moved-from values. (OK)
[x] Para 6 Note: Use math notation for those phrases, instead of English ("product of (the inverse of the second argument) and the first argument"). Also, add a Note (which is currently missing) to the right solve, explaining how users should construct the divide binary function object in that case. (DONE in PR #426 .)
[x] "A is the triangular matrix" -- B and/or X could also be mathematically triangular. We're trying to say that t and d apply to A. However, linalg.general already says that the t and d apply to the preceding argument. Instead of saying "where applicable," say "that apply to A." Make this change throughout. (DONE in PR #426 . "taking into account the Triangle" is a key search phrase.)
[x] Use std::divides<void> as the default binary divide operator, instead of a lambda. Make this change throughout. (DONE in PR #426 .)

`triangular_matrix_matrix_right_solve`

(See above.)

[linalg.algs.blas3.inplacetrsm]

[x] Para 1: "parameters" (not parameter). Fix throughout. (DONE)
[x] See also above for requested Note changes. The right version has the Note we need to copy up to the not-in-place version. (DONE in PR #426 .)
[x] Also, use std::divides<void> as the default binary divide operator. (DONE in PR #426 .)
[x] Rename InMat1 to InMat for triangular_matrix_matrix_left_solve and triangular_matrix_matrix_right_solve. (DONE in PR #426 .)
[x] Para 13: Assign back to B, not X. (DONE in PR #426 .)

[algorithms.parallel.defns]

(OK)

[algorithms.parallel.user]

(OK)

[linalg.layout.packed]

[x] Fix operator() to take two parameters only, not a pack. (DONE in PR #426 .)
[x] Reformulate is_always_unique() so it checks both static extents (because one extent could be static and the other could be dynamic). Ditto for is_always_strided(). (is_unique() and is_strided() are fine, because there's a precondition that the matrix be square.) (DONE in PR #428 .)
[x] Mandates form instead of Constraints is fine. (OK)
[x] Constraint that packed layouts have a matching Triangle? It applies not just to packed matrices. The Constraint is actually spurious. Remove Para 2 (Constraints) in [linalg.algs.blas2.trsv]? For packed layouts, we might not actually need the Triangle template argument to match the packed layout's Triangle. Action item: Check whether it needs to match, and remove this Constraint ([linalg.algs.blas2.trsv] para 2) if it doesn't need to match.
- Fixed; see comments below; changed to Mandate (DONE in PR #428)
[x] Para 4.5: Change r1 to 0, and remove the double right parenthesis after static_extent(1). (DONE; fixed in PR #426 .)
[x] Para 5: (The meaning of this sentence only applies after you apply the Mandates. It should be sufficiently clear that it doesn't need to be regular if you put in weird template arguments.) (OK, this is fine.)

[linalg.layout.packed.cons]

Para 6.1 (first Precondition) is a bit pessimistic, because we only store half the size. On the other hand, worrying about that would make generic algorithms harder to implement. (See below for the action item.)

[linalg.layout.packed.obs]

[x] Para 11 required_span_size() expression is in code font. extents_.extent(0) + 1 could overflow, and more importantly, extents_.extent(0) * (extents_.extent(0) + 1) is bigger than the size of the multidimensional index space. Write this in math font, because the result is definitely in range. Look for other places in the proposal that have this issue. Alternately, replace Mandate above (in Para 4) with a requirement (Mandate + Precondition) that extent(0) * (extent(0) + 1) be representable. (Do the latter.) (Create a function to check?) (9, 6.1, 4.6 -- use the expression without the divides by 2, in math font, for Mandates and Preconditions throughout.) (DONE in PR #428 .)

`operator()`

[x] operator(): Para 12.2: index_Type -> index_type. (DONE; fixed in PR #426 .)
[x] operator() should take exactly two parameters. Adjust 12-14 accordingly.
[x] Cast i and j to index_type before we calculate. (DONE in PR #426 .)
[x] operator() overflow issue? Avoiding overflow would happen if we require that extent(0) * (extent(0) + 1) won't overflow. (Did we resolve this?) (DONE in PR #428 .)

`stride`

[x] Returns: Just always return 1. Don't bother with returning zero. (We must keep the precondition, because it's inherited from the layout mapping requirements.) (DONE in PR #426 .)

[linalg.transp]

[x] [linalg.transp.intro] and [linalg.transp.layout_transpose] need to change "rightmost two indices" (by removing "rightmost") (DONE in PR #426 .)

`transpose-extents-t` and `transpose-extents`

[x] mdspan should generally say which extent is the row extent and which extent is the column extent. Add this to [linalg.tags.order] (Storage order tags). (DONE in PR #426 .)
- We ended up adding definitions of the "rows" and "columns" of a matrix to [linalg.general], so that [linalg.tags.order] can refer to rows and columns.
[x] Para 2: just say how to implement transpose-extents-t. Write it as code, not as English. Fix the Mandates, as the code wouldn't be able to express this. Also, the wording doesn't currently say that the index_type is the same; fix that. Instead, have transpose-extents return auto. (DONE in PR #426 .)
- Add a new section [linalg.transp.helpers]
- Move transpose-extent-t and transpose-extents there
- Define transpose-extents-t in terms of transpose-extents, rather than the other way around
[x] Fix numbering under Para 4. (DONE in PR #426 .)

Resume tomorrow with layout_transpose.

Tasks 2023/11/08

[x] Simplify transpose-extents definition (so it doesn't need auto and so the return type is obvious). (DONE in PR #428 .)
[x] Add triangle_t type alias to layout_blas_packed, so users can write generic code. (DONE in PR #428 .)
[x] Make synopsis consistent with text. (DONE in PR #428 .)

(I've added the following explanation to the Design section of P1673, right above the "Future work" section.)

Regarding the Constraint that the packed layout's Triangle must match the function's Triangle

Summary

P1673 symmetric_*, hermitian_*, and triangular_* functions always take Triangle (and DiagonalStorage, if applicable) parameters so that the functions always have the same signature, regardless of mdspan layout.
For symmetric or Hermitian algorithms, mismatching the mdspan layout's Triangle and the function's Triangle has no effect.
For triangular algorithms, mismatch has an effect that users likely would not want. (It means "use the other triangle," which is zero.) Thus, it's reasonable to make mismatch an error.
In practice, users aren't likely to encounter a triangular packed matrix in isolation. Such matrices usually occur in context of symmetric or Hermitian packed matrices. A common user error might thus be mismatching the Triangles for both symmetric or Hermitian functions, and triangular functions. The first is harmless; the second is likely an error.
Therefore, we recommend
1. retaining the Triangle (and DiagonalStorage, if applicable) parameters;
2. removing the Constraint that the layout's Triangle match the function's Triangle parameter; and
3. making it a Mandate that the layout's Triangle match the function's Triangle parameter, forall the functions (not just the triangular ones).

When do packed formats show up in practice?

Users aren't likely to encounter a triangular packed matrix in isolation. It generally comes as an in-place transformation (e.g., factorization) of a symmetric or Hermitian packed matrix. For example, LAPACK's DSPTRF (Double-precision Symmetric Packed TRiangular Factorization) computes a symmetric $L D L^T$ (or $U D U^T$) factorization in place, overwriting the input symmetric packed matrix $A$. LAPACK's DSPTRS (Double-precision Symmetric Packed TRiangular Solve) then uses the result of DSPTRF to solve a linear system. DSPTRF overwrites $A$ with the triangle $L$ (if $A$ uses lower triangle storage, or $U$, if $A$ uses upper triangle storage). This is an idiom for which the BLAS was designed: factorizations typically overwrite their input, and thus reinterpret its "data structure" on the fly.

What the BLAS does

For a summary of the BLAS' packed storage formats, please refer to the "Packed Storage" section of the LAPACK Users' Guide, Third Edition (1999).

BLAS routines for packed storage have only a single argument, UPLO. This describes both whether the caller is storing the upper or lower triangle, and the triangle of the matrix on which the routine will operate. (Packed BLAS formats always store the diagonal explicitly; they don't have the analog of DiagonalStorage.) An example of a BLAS triangular packed routine is DTPMV, double-precision (D) triangular packed (TP) matrix-vector (MV) product.

BLAS packed formats don't represent metadata explicitly; the caller is responsible for knowing whether they are storing the upper or lower triangle. Getting the UPLO argument wrong makes the matrix wrong. For example, suppose that the matrix is 4 x 4, and the user's array input for the matrix is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. If the user is storing the upper triangle (in column-major order, as the Fortran BLAS requires), then the matrix looks like this.

$$ \begin{matrix} 1 & 2 & 4 & 7 \ & 3 & 5 & 8 \ & & 6 & 9 \ & & & 10 \ \end{matrix} $$

Mismatching the UPLO argument (by passing in 'Lower triangle' instead of 'Upper triangle') would result in an entirely wrong matrix -- not even the transpose. Note how the diagonal elements differ, for instance.

$$ \begin{matrix} 1 & & & \ 2 & 5 & & \ 3 & 6 & 8 & \ 4 & 7 & 9 & 10 \ \end{matrix} $$

This would be incorrect for triangular, symmetric, or Hermitian matrices.

P1673's interpretation of the BLAS

P1673 offers packed formats that encode the Triangle. This means that the mdspan alone conveys the data structure. P1673 retains the function's separate Triangle parameter so that the function's signature doesn't change based on the mdspan's layout. P1673 (up to R12 at least) requires that the function's Triangle match the mdspan's Triangle.

If P1673 were to permit mismatching the two Triangles, how would the function reasonably interpret the user's intent? For triangular matrices with explicit diagonal, mismatch would mean multiplying by or solving with a zero triangle matrix. For triangular matrices with implicit unit diagonal, mismatch would mean multiplying by or solving with a diagonal matrix of ones -- the identity matrix. Users wouldn't want to do either one of those.

For symmetric matrices, mismatch has no effect; the mdspan layout's Triangle rules. For example, the lower triangle of an upper triangle storage format is just the upper triangle again. For Hermitian matrices, again, mismatch has no effect. For example, suppose that the following is the lower triangle representation of a complex-valued Hermitian matrix (where $i$ is $\sqrt{-1}$).

$$ \begin{matrix} 1+1i & & \ 2+2i & 4+4i & \ 3+3i & 5+5i & 6+6i \ \end{matrix} $$

If the user asks the function to operate on the upper triangle of this matrix, that would imply the following.

$$ \begin{matrix} 1+1i & 2-2i & 3-3i \ & 4+4i & 5-5i \ & & 6+6i \ \end{matrix} $$

(Note that the imaginary parts now have negative sign. The matrix is Hermitian, so A[j,i] equals conj(A[i,j]).) That's just the "other triangle" of the matrix. These are Hermitian algorithms, so they will interpret the "other other triangle" in a way that restores the original matrix. Even though the user never stores the original matrix, it would look like this mathematically.

$$ \begin{matrix} 1+1i & 2-2i & 3-3i \ 2+2i & 4+4i & 5-5i \ 3+3i & 5+5i & 6+6i \ \end{matrix} $$

ORNL / cpp-proposals-pub