using vector_t = mdspan<double, dextents<int, 1>>;
using matrix_t = mdspan<double, dextents<int, 2>>;
// ... allocate and fill ...
vector_t x = /* ... */;
vector_t y = /* ... */;
matrix_t A = /* ... */;
// compute y = Ax
matrix_vector_multiply(y, A, x);
// compute norm2 of y
double val = vector_norm2(y);
// mixed precision, different layout
mdspan<float, dextents<int, 2>, layout_left> A_f =
/* allocate and fill */;
matrix_vector_multiply(y, A_f, x);
double val2 = vector_norm2(y);
D) Categories of things in P1673
Algorithms taking mdspan
In-place transformations
Tags
Layouts (layout_blas_packed)
E) Algorithms taking mdspan
Many algorithms, but wording for each is short, so much so that the header synopsis almost doubles the wording length. Would LWG consider letting us drop the synopsis?
E.1) Two categories of algorithms
Not-reduce-like: Take zero or more input mdspan(s) + (output or in/out) mdspan; return void
Example: matrix_vector_product
reduce-like: Take one or more input mdspan(s) + initial scalar value; return scalar result
Example: vector_norm2
E.2) [linalg.reqs.val] "Linear algebra value types"
The most interesting part of wording is not the algorithms themselves, but "Linear algebra value types" [linalg.reqs.val]: requirements on the mdspans' value types that interact in algorithms. One can think of this as a generalization of _GENERALIZED_SUM_. SG6 and LEWG especially discussed this in detail. Davis Herring and Zach Laine, among others, generously contributed wording review time.
F) In-place transformations
These are functions "viewing" an mdspan as another mdspan, with a possibly different layout and/or accessor. (We don't actually use the word "view.")
F.1) Main interface: conjugated, scaled, transposed
These are the "viewing" functions mentioned above. They take an mdspan (and possibly a scalar), and return another mdspan.
Function name
Effect
What it changes
transposed
Matrix transpose (i,j) -> (j,i)
Layout
scaled
Multiply all elements by a scalar
Accessor
conjugated
Complex conjugate of all elements
Accessor
conjugate_transposed
transposed and conjugated
Both
Key feature: Result is an mdspan. P1673's algorithms represent arrays as mdspan.
P1673 defines no extension point mechanism to let algorithms take more general things.
Design principle: P1673 is "the BLAS in C++," not a general algebra system. For example:
mdspan's accessor has no knowledge of the multidimensional index pack given to the layout mapping; it doesn't know "where it is in the matrix."
Thus, mdspan cannot represent a diagonally scaled matrix (where each column or row is scaled by a different scaling factor). This has implications for the batched generalization of P1673, P2901R0.
Thus, mdspan alone cannot represent an unpacked Hermitian or triangular matrix (where the interpretation of an element's value depends on its index (i,j)), though algorithms can interpret a matrix that way. This is why P1673 has different algorithms for symmetric_*, hermitian_*, and triangular_* cases.
F.2) Functions that change the accessor
Functions that change the mdspan's accessor have more complicated wording than functions that change the layout. Mechanism is like atomic_accessor*(P2689R2):
Define a custom accessor
Custom accessor uses a proxy reference to transform value
Proxy reference has overloaded arithmetic operators (like atomic_ref*)
We do this twice:
Function
Accessor
Proxy reference
scaled
accessor_scaled
scaled_scalar
conjugated
accessor_conjugate
conjugated_scalar
Accessors are not exposition-only
The accessors that use these proxy reference types are not exposition-only. This lets users write custom algorithms that can optimize by knowing that an mdspan comes from conjugated and scaled results, just as P1673's implementations can optimize for this case.
Proxy reference design goals
Work with users' custom value types (or proxy reference types, or expression templates), without those types needing overloaded arithmetic operators for P1673's proxy reference types
e.g., add(scaled(alpha, x), y, z) where y's reference is atomic_ref<float>
Preserve arithmetic order for nested scaled expressions
scaled(alpha, scaled(beta, x)) should not be transformed into scaled(alpha * beta, x), EVEN THOUGH this may let implementations use an optimized BLAS
e.g., if x is a rank-1 mdspan whose value_type is double, and if sizeof(int) is 4 and double is IEEE 754 binary64, then scaled(1 << 20, scaled(1 << 20, x)) does not overflow int, but scaled((1 << 20) * (1 << 20), x) would overflow int
Algorithms' arithmetic expressions must work for arbitrary combinations and nestings of proxy reference types, as long as the underlying arithmetic makes sense
Nesting of scaled and/or conjugated expressions is not a typical use case (P1673 is not a computer algebra system), but should be correct.
Avoid duplicated wording, especially for overloaded arithmetic operators.
P1673R9 solved these problems effectively by specifying an implementation (proxy-reference). We're not attached to this wording approach.
Key features of proxy-reference
Three overloads per binary arithmetic operator
Three overloads per binary arithmetic operator, each with different constraints, ensure no ambiguity in expressions with different/same proxy reference types.
constexpr friend auto operator+(derived_type lhs, Rhs rhs), with Rhs one of our proxy references.
It converts "the other proxy reference" to itsvalue_type immediately, so users' value_types, proxy references, or expression templates don't need to overload arithmetic operators for P1673's values.
Using "itsvalue_type," rather than mine, preserves precision in mixed-precision expressions. For example, if Rhs::value_type is double and my value_type is float, converting to my value_type (float) would lose precision.
constexpr friend auto operator+(derived_type lhs, Rhs rhs), with Rhs NOT one of our proxy references, uses rhs directly. (This requires overloaded arithmetic operators for (value_type(lhs), rhs). See below for other options.)
constexpr friend auto operator+(Lhs lhs, derived_type rhs), with Rhs NOT one of our proxy references, is order-reversed (b).
abs, real, imag, and conj overloads
The abs, real, imag, and conj overloads are part of a system in P1673 for working around issues with these existing functions in the Standard, while respecting the convention that existing generic linear algebra libraries use and overload these functions.
proxy-reference wording errata
Replace enable_if_t with constraints (we've tested that this builds with GCC 13.1.0)
proxy-reference wording questions and our answers
Here, we preempt some possible questions about our wording choices.
"Deducing this"? No need
Only one "base class customization point," operator value_type()
Using deducing this would require making it a template
Other ways to specify arithmetic operators?
Base class proxy-reference lets us avoid ambiguity and code duplication.
We could repeat hidden friend arithmetic operators in each proxy reference class, but they would still need to be constrained to avoid ambiguity.
Base class CRTP design may have compile-time cost, but is it actually observable?
is_base_of_v<@_proxy-reference-base_@, Rhs> is just a way to say "this is a known proxy reference type with a public value_type alias."
P1673 does NOT attempt to define generic requirements for proxy references.
It only knows about scaled_scalar and conjugated_scalar.
This has functionality consequences. For example, proxy-reference's arithmetic operators do NOT convert atomic_ref<T> to T (value_type conversion) on input to their arithmetic expressions. (Only scaled_scalar and conjugated_scalar get that treatment.) Therefore, atomic_ref<complex<T>> CANNOT be used in P1673's algorithms, as atomic_ref<complex<T>> lacks overloaded arithmetic operators (unlike atomic_ref<Real>). P1673 accepts this, based on the principle that "P1673 is not a computer algebra system" and that it should defer to the behavior of its input types.
It's tempting to change is_base_of_v<@_proxy-reference-base_@, Rhs> constraint to requires { typename Rhs::value_type; }, but this won't work in general.
While atomic_ref and generalizations in P2689 have value_type alias, these were designed for use with mdspan, where value_type has a particular expected meaning.
Users' types may define value_type differently: e.g., a hypothetical custom bigfloat may store its bits in uint128_t, have a value_type = uint128_t alias, and have a (perhaps ill-advised) conversion operator to value_type.
vector<bool>::reference has no value_type alias (perhaps it should? but it's existing precedent and may have inspired other designs)
G) Tags
upper_triangle_t
lower_triangle_t
implicit_unit_diagonal_t
explicit_diagonal_t
These tags don't refer to layouts.
Instead, they tell algorithms how to interpret data in place.
Idiomatic BLAS and LAPACK use reinterprets a matrix's "properties" in place,
as a result of algorithms that operate in place on the matrix.
Examples:
LU factorization replaces the contents of a (possibly nonsymmetric) square matrix A with L and U factors. triangular_matrix_vector_solve with the same mdspan either reads L or U, depending on the tag.
Cholesky with lower_triangle would operate on a symmetric matrix in place, and produce a triangular matrix. The data in the array remain symmetric (A[i, j] == A[j, i]), but the interpretation becomes "lower triangular matrix."
H) Layouts: layout_blas_packed
Packed triangular, symmetric, or Hermitian layouts.
P1673R12 LWG presentation
A) Previous WG21 talks and proposals
B) Brief summary
Algorithms operating on 'mdspan' "viewing" users' data
mdspan
are to P1673 algorithmsmdspan
Extends BLAS (Basic Linear Algebra Subroutines)
ExecutionPolicy&&
)Extensible
C) Example: matrix-vector product
D) Categories of things in P1673
mdspan
layout_blas_packed
)E) Algorithms taking
mdspan
Many algorithms, but wording for each is short, so much so that the header synopsis almost doubles the wording length. Would LWG consider letting us drop the synopsis?
E.1) Two categories of algorithms
Not-
reduce
-like: Take zero or more inputmdspan
(s) + (output or in/out)mdspan
; returnvoid
matrix_vector_product
reduce
-like: Take one or more inputmdspan
(s) + initial scalar value; return scalar resultvector_norm2
E.2) [linalg.reqs.val] "Linear algebra value types"
The most interesting part of wording is not the algorithms themselves, but "Linear algebra value types" [linalg.reqs.val]: requirements on the
mdspan
s' value types that interact in algorithms. One can think of this as a generalization of _GENERALIZED_SUM
_. SG6 and LEWG especially discussed this in detail. Davis Herring and Zach Laine, among others, generously contributed wording review time.F) In-place transformations
These are functions "viewing" an
mdspan
as anothermdspan
, with a possibly different layout and/or accessor. (We don't actually use the word "view.")F.1) Main interface:
conjugated
,scaled
,transposed
These are the "viewing" functions mentioned above. They take an
mdspan
(and possibly a scalar), and return anothermdspan
.transposed
(i,j)
->(j,i)
scaled
conjugated
conjugate_transposed
transposed
andconjugated
Key feature: Result is an
mdspan
. P1673's algorithms represent arrays asmdspan
. P1673 defines no extension point mechanism to let algorithms take more general things.Design principle: P1673 is "the BLAS in C++," not a general algebra system. For example:
mdspan
's accessor has no knowledge of the multidimensional index pack given to the layout mapping; it doesn't know "where it is in the matrix."Thus,
mdspan
cannot represent a diagonally scaled matrix (where each column or row is scaled by a different scaling factor). This has implications for the batched generalization of P1673, P2901R0.Thus,
mdspan
alone cannot represent an unpacked Hermitian or triangular matrix (where the interpretation of an element's value depends on its index (i,j)), though algorithms can interpret a matrix that way. This is why P1673 has different algorithms forsymmetric_*
,hermitian_*
, andtriangular_*
cases.F.2) Functions that change the accessor
Functions that change the
mdspan
's accessor have more complicated wording than functions that change the layout. Mechanism is likeatomic_accessor*
(P2689R2):atomic_ref*
)We do this twice:
scaled
accessor_scaled
scaled_scalar
conjugated
accessor_conjugate
conjugated_scalar
Accessors are not exposition-only
The accessors that use these proxy reference types are not exposition-only. This lets users write custom algorithms that can optimize by knowing that an
mdspan
comes fromconjugated
andscaled
results, just as P1673's implementations can optimize for this case.Proxy reference design goals
Work with users' custom value types (or proxy reference types, or expression templates), without those types needing overloaded arithmetic operators for P1673's proxy reference types
add(scaled(alpha, x), y, z)
wherey
'sreference
isatomic_ref<float>
Preserve arithmetic order for nested
scaled
expressionsscaled(alpha, scaled(beta, x))
should not be transformed intoscaled(alpha * beta, x)
, EVEN THOUGH this may let implementations use an optimized BLASe.g., if
x
is a rank-1mdspan
whosevalue_type
isdouble
, and ifsizeof(int)
is 4 anddouble
is IEEE 754 binary64, thenscaled(1 << 20, scaled(1 << 20, x))
does not overflowint
, butscaled((1 << 20) * (1 << 20), x)
would overflowint
e.g.,
alpha
=uint64_t(INT_MAX/2)
,beta
=INT_MAX/2 + 1
, andx[0]
=2
Algorithms' arithmetic expressions must work for arbitrary combinations and nestings of proxy reference types, as long as the underlying arithmetic makes sense
scaled
and/orconjugated
expressions is not a typical use case (P1673 is not a computer algebra system), but should be correct.Avoid duplicated wording, especially for overloaded arithmetic operators.
P1673R9 solved these problems effectively by specifying an implementation (
proxy-reference
). We're not attached to this wording approach.Key features of
proxy-reference
Three overloads per binary arithmetic operator
Three overloads per binary arithmetic operator, each with different constraints, ensure no ambiguity in expressions with different/same proxy reference types.
constexpr friend auto operator+(derived_type lhs, Rhs rhs)
, withRhs
one of our proxy references.value_type
immediately, so users'value_type
s, proxy references, or expression templates don't need to overload arithmetic operators for P1673's values.value_type
," rather than mine, preserves precision in mixed-precision expressions. For example, ifRhs::value_type
isdouble
and myvalue_type
isfloat
, converting to myvalue_type
(float
) would lose precision.constexpr friend auto operator+(derived_type lhs, Rhs rhs)
, withRhs
NOT one of our proxy references, usesrhs
directly. (This requires overloaded arithmetic operators for(value_type(lhs), rhs)
. See below for other options.)constexpr friend auto operator+(Lhs lhs, derived_type rhs)
, withRhs
NOT one of our proxy references, is order-reversed (b).abs
,real
,imag
, andconj
overloadsThe
abs
,real
,imag
, andconj
overloads are part of a system in P1673 for working around issues with these existing functions in the Standard, while respecting the convention that existing generic linear algebra libraries use and overload these functions.proxy-reference
wording errataenable_if_t
with constraints (we've tested that this builds with GCC 13.1.0)proxy-reference
wording questions and our answersHere, we preempt some possible questions about our wording choices.
"Deducing
this
"? No needOnly one "base class customization point,"
operator value_type()
Using deducing
this
would require making it a templateOther ways to specify arithmetic operators?
Base class
proxy-reference
lets us avoid ambiguity and code duplication.We could repeat hidden friend arithmetic operators in each proxy reference class, but they would still need to be constrained to avoid ambiguity.
Base class CRTP design may have compile-time cost, but is it actually observable?
is_base_of_v<@_proxy-reference-base_@, Rhs>
is just a way to say "this is a known proxy reference type with a publicvalue_type
alias."P1673 does NOT attempt to define generic requirements for proxy references.
It only knows about
scaled_scalar
andconjugated_scalar
.This has functionality consequences. For example,
proxy-reference
's arithmetic operators do NOT convertatomic_ref<T>
toT
(value_type
conversion) on input to their arithmetic expressions. (Onlyscaled_scalar
andconjugated_scalar
get that treatment.) Therefore,atomic_ref<complex<T>>
CANNOT be used in P1673's algorithms, asatomic_ref<complex<T>>
lacks overloaded arithmetic operators (unlikeatomic_ref<Real>
). P1673 accepts this, based on the principle that "P1673 is not a computer algebra system" and that it should defer to the behavior of its input types.It's tempting to change
is_base_of_v<@_proxy-reference-base_@, Rhs>
constraint torequires { typename Rhs::value_type; }
, but this won't work in general.While
atomic_ref
and generalizations in P2689 havevalue_type
alias, these were designed for use withmdspan
, wherevalue_type
has a particular expected meaning.Users' types may define
value_type
differently: e.g., a hypothetical custom bigfloat may store its bits inuint128_t
, have avalue_type = uint128_t
alias, and have a (perhaps ill-advised) conversion operator tovalue_type
.vector<bool>::reference
has novalue_type
alias (perhaps it should? but it's existing precedent and may have inspired other designs)G) Tags
upper_triangle_t
lower_triangle_t
implicit_unit_diagonal_t
explicit_diagonal_t
These tags don't refer to layouts. Instead, they tell algorithms how to interpret data in place. Idiomatic BLAS and LAPACK use reinterprets a matrix's "properties" in place, as a result of algorithms that operate in place on the matrix. Examples:
triangular_matrix_vector_solve
with the same mdspan either reads L or U, depending on the tag.lower_triangle
would operate on a symmetric matrix in place, and produce a triangular matrix. The data in the array remain symmetric (A[i, j] == A[j, i]
), but the interpretation becomes "lower triangular matrix."H) Layouts:
layout_blas_packed
Packed triangular, symmetric, or Hermitian layouts.