ORNL / cpp-proposals-pub

Collaborating on papers for the ISO C++ committee - public repo
26 stars 26 forks source link

P1673R12: LWG presentation #379

Open mhoemmen opened 1 year ago

mhoemmen commented 1 year ago

P1673R12 LWG presentation

A) Previous WG21 talks and proposals

B) Brief summary

C) Example: matrix-vector product

using vector_t = mdspan<double, dextents<int, 1>>;
using matrix_t = mdspan<double, dextents<int, 2>>;

// ... allocate and fill ...
vector_t x = /* ... */;
vector_t y = /* ... */;
matrix_t A = /* ... */;

// compute y = Ax
matrix_vector_multiply(y, A, x);

// compute norm2 of y
double val = vector_norm2(y);

// mixed precision, different layout 
mdspan<float, dextents<int, 2>, layout_left> A_f =
  /* allocate and fill */;
matrix_vector_multiply(y, A_f, x);
double val2 = vector_norm2(y);

D) Categories of things in P1673

  1. Algorithms taking mdspan
  2. In-place transformations
  3. Tags
  4. Layouts (layout_blas_packed)

E) Algorithms taking mdspan

Many algorithms, but wording for each is short, so much so that the header synopsis almost doubles the wording length. Would LWG consider letting us drop the synopsis?

E.1) Two categories of algorithms

  1. Not-reduce-like: Take zero or more input mdspan(s) + (output or in/out) mdspan; return void

    • Example: matrix_vector_product
  2. reduce-like: Take one or more input mdspan(s) + initial scalar value; return scalar result

    • Example: vector_norm2

E.2) [linalg.reqs.val] "Linear algebra value types"

The most interesting part of wording is not the algorithms themselves, but "Linear algebra value types" [linalg.reqs.val]: requirements on the mdspans' value types that interact in algorithms. One can think of this as a generalization of _GENERALIZED_SUM_. SG6 and LEWG especially discussed this in detail. Davis Herring and Zach Laine, among others, generously contributed wording review time.

F) In-place transformations

These are functions "viewing" an mdspan as another mdspan, with a possibly different layout and/or accessor. (We don't actually use the word "view.")

F.1) Main interface: conjugated, scaled, transposed

These are the "viewing" functions mentioned above. They take an mdspan (and possibly a scalar), and return another mdspan.

Function name Effect What it changes
transposed Matrix transpose (i,j) -> (j,i) Layout
scaled Multiply all elements by a scalar Accessor
conjugated Complex conjugate of all elements Accessor
conjugate_transposed transposed and conjugated Both

Key feature: Result is an mdspan. P1673's algorithms represent arrays as mdspan. P1673 defines no extension point mechanism to let algorithms take more general things.

Design principle: P1673 is "the BLAS in C++," not a general algebra system. For example:

F.2) Functions that change the accessor

Functions that change the mdspan's accessor have more complicated wording than functions that change the layout. Mechanism is like atomic_accessor* (P2689R2):

  1. Define a custom accessor
  2. Custom accessor uses a proxy reference to transform value
  3. Proxy reference has overloaded arithmetic operators (like atomic_ref*)

We do this twice:

Function Accessor Proxy reference
scaled accessor_scaled scaled_scalar
conjugated accessor_conjugate conjugated_scalar

Accessors are not exposition-only

The accessors that use these proxy reference types are not exposition-only. This lets users write custom algorithms that can optimize by knowing that an mdspan comes from conjugated and scaled results, just as P1673's implementations can optimize for this case.

Proxy reference design goals

  1. Work with users' custom value types (or proxy reference types, or expression templates), without those types needing overloaded arithmetic operators for P1673's proxy reference types

    • e.g., add(scaled(alpha, x), y, z) where y's reference is atomic_ref<float>
  2. Preserve arithmetic order for nested scaled expressions

    • scaled(alpha, scaled(beta, x)) should not be transformed into scaled(alpha * beta, x), EVEN THOUGH this may let implementations use an optimized BLAS

    • e.g., if x is a rank-1 mdspan whose value_type is double, and if sizeof(int) is 4 and double is IEEE 754 binary64, then scaled(1 << 20, scaled(1 << 20, x)) does not overflow int, but scaled((1 << 20) * (1 << 20), x) would overflow int

    • e.g., alpha = uint64_t(INT_MAX/2), beta = INT_MAX/2 + 1, and x[0] = 2

  3. Algorithms' arithmetic expressions must work for arbitrary combinations and nestings of proxy reference types, as long as the underlying arithmetic makes sense

    • Nesting of scaled and/or conjugated expressions is not a typical use case (P1673 is not a computer algebra system), but should be correct.
  4. Avoid duplicated wording, especially for overloaded arithmetic operators.

P1673R9 solved these problems effectively by specifying an implementation (proxy-reference). We're not attached to this wording approach.

Key features of proxy-reference

Three overloads per binary arithmetic operator

Three overloads per binary arithmetic operator, each with different constraints, ensure no ambiguity in expressions with different/same proxy reference types.

  1. constexpr friend auto operator+(derived_type lhs, Rhs rhs), with Rhs one of our proxy references.

    • It converts "the other proxy reference" to its value_type immediately, so users' value_types, proxy references, or expression templates don't need to overload arithmetic operators for P1673's values.
    • Using "its value_type," rather than mine, preserves precision in mixed-precision expressions. For example, if Rhs::value_type is double and my value_type is float, converting to my value_type (float) would lose precision.
  2. constexpr friend auto operator+(derived_type lhs, Rhs rhs), with Rhs NOT one of our proxy references, uses rhs directly. (This requires overloaded arithmetic operators for (value_type(lhs), rhs). See below for other options.)

  3. constexpr friend auto operator+(Lhs lhs, derived_type rhs), with Rhs NOT one of our proxy references, is order-reversed (b).

abs, real, imag, and conj overloads

The abs, real, imag, and conj overloads are part of a system in P1673 for working around issues with these existing functions in the Standard, while respecting the convention that existing generic linear algebra libraries use and overload these functions.

proxy-reference wording errata

proxy-reference wording questions and our answers

Here, we preempt some possible questions about our wording choices.

G) Tags

These tags don't refer to layouts. Instead, they tell algorithms how to interpret data in place. Idiomatic BLAS and LAPACK use reinterprets a matrix's "properties" in place, as a result of algorithms that operate in place on the matrix. Examples:

  1. LU factorization replaces the contents of a (possibly nonsymmetric) square matrix A with L and U factors. triangular_matrix_vector_solve with the same mdspan either reads L or U, depending on the tag.
  2. Cholesky with lower_triangle would operate on a symmetric matrix in place, and produce a triangular matrix. The data in the array remain symmetric (A[i, j] == A[j, i]), but the interpretation becomes "lower triangular matrix."

H) Layouts: layout_blas_packed

Packed triangular, symmetric, or Hermitian layouts.