ORNL / cpp-proposals-pub

Collaborating on papers for the ISO C++ committee - public repo
26 stars 26 forks source link

P1673: LWG review 2023/11/07 (Kona) #427

Closed mhoemmen closed 10 months ago

mhoemmen commented 10 months ago

P1673: LWG review 2023/11/07 (Kona)

[linalg.algs.blas3.trsm]

triangular_matrix_matrix_left_solve

triangular_matrix_matrix_right_solve

(See above.)

[linalg.algs.blas3.inplacetrsm]

[algorithms.parallel.defns]

(OK)

[algorithms.parallel.user]

(OK)

[linalg.layout.packed]

[linalg.layout.packed.cons]

Para 6.1 (first Precondition) is a bit pessimistic, because we only store half the size. On the other hand, worrying about that would make generic algorithms harder to implement. (See below for the action item.)

[linalg.layout.packed.obs]

operator()

stride

[linalg.transp]

transpose-extents-t and transpose-extents

Resume tomorrow with layout_transpose.

Tasks 2023/11/08

mhoemmen commented 10 months ago

transpose-extents and _transpose-extents_t_ changes didn't get included in PR #426. I've added that commit to the new PR #428.

mhoemmen commented 10 months ago

(I've added the following explanation to the Design section of P1673, right above the "Future work" section.)

Regarding the Constraint that the packed layout's Triangle must match the function's Triangle

Summary

When do packed formats show up in practice?

Users aren't likely to encounter a triangular packed matrix in isolation. It generally comes as an in-place transformation (e.g., factorization) of a symmetric or Hermitian packed matrix. For example, LAPACK's DSPTRF (Double-precision Symmetric Packed TRiangular Factorization) computes a symmetric $L D L^T$ (or $U D U^T$) factorization in place, overwriting the input symmetric packed matrix $A$. LAPACK's DSPTRS (Double-precision Symmetric Packed TRiangular Solve) then uses the result of DSPTRF to solve a linear system. DSPTRF overwrites $A$ with the triangle $L$ (if $A$ uses lower triangle storage, or $U$, if $A$ uses upper triangle storage). This is an idiom for which the BLAS was designed: factorizations typically overwrite their input, and thus reinterpret its "data structure" on the fly.

What the BLAS does

For a summary of the BLAS' packed storage formats, please refer to the "Packed Storage" section of the LAPACK Users' Guide, Third Edition (1999).

BLAS routines for packed storage have only a single argument, UPLO. This describes both whether the caller is storing the upper or lower triangle, and the triangle of the matrix on which the routine will operate. (Packed BLAS formats always store the diagonal explicitly; they don't have the analog of DiagonalStorage.) An example of a BLAS triangular packed routine is DTPMV, double-precision (D) triangular packed (TP) matrix-vector (MV) product.

BLAS packed formats don't represent metadata explicitly; the caller is responsible for knowing whether they are storing the upper or lower triangle. Getting the UPLO argument wrong makes the matrix wrong. For example, suppose that the matrix is 4 x 4, and the user's array input for the matrix is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. If the user is storing the upper triangle (in column-major order, as the Fortran BLAS requires), then the matrix looks like this.

$$ \begin{matrix} 1 & 2 & 4 & 7 \ & 3 & 5 & 8 \ & & 6 & 9 \ & & & 10 \ \end{matrix} $$

Mismatching the UPLO argument (by passing in 'Lower triangle' instead of 'Upper triangle') would result in an entirely wrong matrix -- not even the transpose. Note how the diagonal elements differ, for instance.

$$ \begin{matrix} 1 & & & \ 2 & 5 & & \ 3 & 6 & 8 & \ 4 & 7 & 9 & 10 \ \end{matrix} $$

This would be incorrect for triangular, symmetric, or Hermitian matrices.

P1673's interpretation of the BLAS

P1673 offers packed formats that encode the Triangle. This means that the mdspan alone conveys the data structure. P1673 retains the function's separate Triangle parameter so that the function's signature doesn't change based on the mdspan's layout. P1673 (up to R12 at least) requires that the function's Triangle match the mdspan's Triangle.

If P1673 were to permit mismatching the two Triangles, how would the function reasonably interpret the user's intent? For triangular matrices with explicit diagonal, mismatch would mean multiplying by or solving with a zero triangle matrix. For triangular matrices with implicit unit diagonal, mismatch would mean multiplying by or solving with a diagonal matrix of ones -- the identity matrix. Users wouldn't want to do either one of those.

For symmetric matrices, mismatch has no effect; the mdspan layout's Triangle rules. For example, the lower triangle of an upper triangle storage format is just the upper triangle again. For Hermitian matrices, again, mismatch has no effect. For example, suppose that the following is the lower triangle representation of a complex-valued Hermitian matrix (where $i$ is $\sqrt{-1}$).

$$ \begin{matrix} 1+1i & & \ 2+2i & 4+4i & \ 3+3i & 5+5i & 6+6i \ \end{matrix} $$

If the user asks the function to operate on the upper triangle of this matrix, that would imply the following.

$$ \begin{matrix} 1+1i & 2-2i & 3-3i \ & 4+4i & 5-5i \ & & 6+6i \ \end{matrix} $$

(Note that the imaginary parts now have negative sign. The matrix is Hermitian, so A[j,i] equals conj(A[i,j]).) That's just the "other triangle" of the matrix. These are Hermitian algorithms, so they will interpret the "other other triangle" in a way that restores the original matrix. Even though the user never stores the original matrix, it would look like this mathematically.

$$ \begin{matrix} 1+1i & 2-2i & 3-3i \ 2+2i & 4+4i & 5-5i \ 3+3i & 5+5i & 6+6i \ \end{matrix} $$

mhoemmen commented 10 months ago

PR #428, which fixed these issues, has merged.