Memory layout of buffers behind DenseSkOp objects

What's in this PR?

This PR extends the capabilities of DenseSkOp objects. It also sets new semantics for DenseDist and DenseSkOp objects.

1. New capabilities.

The layout (row-major or column-major) of DenseSkOp.buff can now be different from the layout of the input data matrix and the output sketch.

2. Breaking API changes.

DenseDist structs dist now have a new field called dist.major_axis. This field affects whether sketching operators following dist are represented by buffers in row-major or column-major order. The options for this field are MajorAxis::Long and MajorAxis::Short.

The constructors for DenseSkOp objects no longer allow a layout parameter. Instead, the layout is determined automatically from the dist argument, as per the table below.

Is the sketching operator wide or tall?	short- or long-axis major?	Implied storage order
wide	MajorAxis::Long	row-major
wide	MajorAxis::Short	column-major
tall	MajorAxis::Long	column-major
tall	MajorAxis::Short	row-major

3. Incidental changes

I restructured our test code to make things more modular.

Why bother?

I claim that framing DenseSkOp objects as being "long axis-major" or "short axis-major" is more useful than framing discussion in terms of row or column-major. The high-level reason for this is to specify RandBLAS' semantics without constantly considering different cases for wide sketching operators or tall sketching operators. This philosophy is stated in the RandLAPACK book; it's also reflected in the RandBLAS API for sparse sketching operators (which adopts a taxonomy of being short-axis-sparse or long-axis-sparse).

1. Swapping a distribution's row and column dimensions should be equivalent to transposition.

Suppose we fix a MajorAxis variable ma and a DenseDistName variable distname. For positive integers (n, k), define distributions

DenseDist D1{n, k, distname, ma}
DenseDist D2{k, n, distname, ma}

and consider a fixed RNGState, state. This PR makes it so that if we sample

DenseSkOp S1(D1, state)
DenseSkOp S2(D2, state)

then S1 == S2.T holds in a mathematical sense. Tests are included to verify this functionality.

2. Short-axis major layout: easily incorporate more data into a sketch of fixed size

2.1. Incorporating new rows when sketching from the left.

Suppose we start with an $m_1 \times n$ matrix $A_1$, a $d \times m_1$ sketching operator $S_1$, and a $d \times n$ sketch $B_1 = S_1 A_1$. Suppose also that we come across a new $m_2 \times n$ matrix $A_2$, and we want a $d \times n$ sketch of the $(m_1 + m_2) \times n$ matrix

A = \begin{bmatrix} A_1 \\ A_2 \end{bmatrix}

obtained by adding rows to $A_1$.

There are two ways to get such a sketch. The naive approach is to throw out $B_1$ and generate a new $d \times (m_1+m_2)$ sketching operator $S$ before computing $B = SA$. A preferred approach is to only generate a new $d \times m_2$ sketching operator independently from $S_1$, implicitly define $S = [S_1, S_2]$, and then implicitly compute $B = B_1 + S_2 A_2$.

It is easy to see that if $S_1$ was column-major, then it should be possible to generate column-major $S_2$ so that $S = [S_1, S_2]$ is the same as if we generated a $d \times (m_1 + m_2)$ column-major sketching operator $S$ from scratch using the same random seed as $S_1$. By contrast, if $S_1$ and $S_2$ were generated in row-major format, then $[S_1, S_2]$ would differ from a $d \times (m_1 + m_2)$ operator generated with the same seed as $S_1$.

2.2. Incorporating new columns when sketching from the right.

Let's start with an $m \times n_1$ matrix $A_1$, an $n_1 \times d$ sketching operator $S_1$, and an $m \times d$ sketch $B_1 = A_1 S_1$. Suppose we come across an $m \times n_2$ matrix $A_2$ and that we want a $m \times d$ sketch of the $m \times (n_1 + n_2)$ matrix

A = \begin{bmatrix} A_1 & A_2 \end{bmatrix}

obtained by adding columns to $A_1$.

As with the case of adding rows considered above, there are two ways to do this. The more efficient way is to generate only a new $n_2 \times d$ sketching operator $S_2$, and then to implicitly compute

B = \begin{bmatrix} A_1 & A_2 \end{bmatrix}\begin{bmatrix} S_1 \\ S_2 \end{bmatrix}

by accumulating $B = B_1 + A_2 S_2$.

In this situation, it is easy to see that generating $S_1$ and $S_2$ in row-major format makes it possible for the augmented matrix $[S_1 ; S_2]$ to be the same as if it was generated from scratch using the same seed as $S_1$.

2.3. A unified perspective.

A sketching operator that is applied from the left can only enact dimension reduction if it is wide. A wide sketching operator $\texttt{S}$ is in column-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.

A sketching operator that is applied from the right can only enact dimension reduction if it is tall. A tall sketching operator $\texttt{S}$ is in row-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.

When considered together, we see that if one's goal is to incorporate new data without increasing the size of the sketch, then having the sketching operator's distribution be short-axis major provides reproducibility if the same data arrives over time in blocks of different sizes.

3. Long-axis major layout: easily increase the size of a sketch

Let's say we have a fixed data matrix $A$ and an initial sketching operator $S_1$.

Suppose we want to enlarge the sketch $S_1 A$ by adding rows to $S_1$. Naturally, if $S_1$ is generated in row-major format, then it is possible to augment $S_1$ with row-major $S_2$ in a way that is equivalent to generating $[S_1; S_2]$ from scratch using the same seed as $S_1$.

Suppose instead that we want to enlarge $A S_1$ by adding columns to $S_1$. In this case, generating $S_1$ in column-major format makes it so that we can generate column-major $S_2$ in a way that is equivalent to generating $[S_1, S_2]$ from scratch.

Taken together, we see that if one wants to allow for extending the sketch later -- without depending on whether the extension happens incrementally or all at once -- then it is preferable to have the sketching operator be long-axis major.

BallisticLA / RandBLAS