ORNL / cpp-proposals-pub

Collaborating on papers for the ISO C++ committee - public repo
26 stars 27 forks source link

P3050R1: LEWG presentation #467

Closed mhoemmen closed 3 months ago

mhoemmen commented 3 months ago

P3050R1

Terms

Common practice: "conjugate transpose of a noncomplex matrix is just the transpose"

The conjugate transpose of a complex matrix naturally generalizes the transpose of a noncomplex matrix. Users who develop generic algorithms for either complex or noncomplex problems write the algorithm once using the conjugate transpose. BLAS and matrix-oriented programming languages like Matlab treat both using the same notation (e.g., 'C' flag means transpose for a noncomplex matrix, conjugate transpose for a complex matrix).

mdspan views conjugated, transposed, and conjugate_transposed

A key feature of linear algebra libraries is their ability to view the transpose or conjugate transpose of a matrix "in place" without actually changing its elements. Matrices may be large and copying them may be too expensive.

How does conjugated work currently?

  1. If the input mdspan has accessor type conjugated_accessor<NestedAccessor>, then the result has accessor type NestedAccessor;

  2. otherwise, if the input mdspan has accessor type Accessor, then the result has accessor type conjugated_accessor<Accessor>.

conjugated_accessor's (read-only) access function conjugates the element if it's a complex number, else it just returns the number.

This is correct, but can hinder optimization

The current behavior of conjugated is correct. The issue is that conjugated(A) for an mdspan-of-noncomplex-numbers A doesn't have the same accessor type as A. It should just return A!

This is an issue because both Standard Library implementations and users may want to optimize for "known accessors" such as default_accessor. Accessors communicate optimization information, like "this is a contiguous array in memory."

template<class ElementType, class IndexType, size_t Ext0, class Layout, class Accessor>
void generic_algorithm( // fully generic
  mdspan<ElementType, extent<IndexType, Ext0>, Layout, Accessor> x);

template<class ElementType, class IndexType, size_t Ext0, class Layout>
void generic_algorithm( // specialization
  mdspan<ElementType, extent<IndexType, Ext0>, Layout, default_accessor<ElementType>> x);

Currently, conjugated of a default_accessor<ElementType> mdspan has accessor conjugated_accessor<default_accessor<ElementType>>. Calling generic_algorithm with this mdspan will thus take the "generic path," rather than the specialization.

If we want to optimize the conjugated_accessor case, we have to add another specialization. This has compile-time costs. Users either have to remember to do this, or write their generic algorithms twice (once for complex and once for noncomplex).

template<class Real, class IndexType, size_t Ext0, class Layout>
  requires(not impl::is_complex_v<Real>)
void generic_algorithm( // another specialization
  mdspan<Real, extent<IndexType, Ext0>, Layout, conjugated_accessor<default_accessor<Real>>> x)
{
  // Dispatch to default_accessor specialization
  return generic_algorithm(mdspan{x.data_handle(), x.mapping(), x.accessor().nested_accessor()});
}

Fix: conjugated(A) should return A if A is noncomplex

P3050 proposes the only reasonable fix: make conjugated(A) return A if the value_type of A is not complex.

"Not complex" just means that conj is not ADL-findable for the mdspan's value_type.

Recall that mdspan has element_type (possibly cv-qualified) and value_type = remove_cvref_t<element_type>.

mhoemmen commented 3 months ago

I'm putting this into P3050R2.