BallisticLA / RandBLAS

A header-only C++ library for sketching in randomized linear algebra
https://randblas.readthedocs.io/en/latest/
Other
64 stars 4 forks source link

Memory layout of buffers behind DenseSkOp objects #50

Closed rileyjmurray closed 1 year ago

rileyjmurray commented 1 year ago

What's in this PR?

This PR extends the capabilities of DenseSkOp objects. It also sets new semantics for DenseDist and DenseSkOp objects.

1. New capabilities.

The layout (row-major or column-major) of DenseSkOp.buff can now be different from the layout of the input data matrix and the output sketch.

2. Breaking API changes.

DenseDist structs dist now have a new field called dist.major_axis. This field affects whether sketching operators following dist are represented by buffers in row-major or column-major order. The options for this field are MajorAxis::Long and MajorAxis::Short.

The constructors for DenseSkOp objects no longer allow a layout parameter. Instead, the layout is determined automatically from the dist argument, as per the table below.

Is the sketching operator wide or tall? short- or long-axis major? Implied storage order
wide MajorAxis::Long row-major
wide MajorAxis::Short column-major
tall MajorAxis::Long column-major
tall MajorAxis::Short row-major

3. Incidental changes

I restructured our test code to make things more modular.

Why bother?

I claim that framing DenseSkOp objects as being "long axis-major" or "short axis-major" is more useful than framing discussion in terms of row or column-major. The high-level reason for this is to specify RandBLAS' semantics without constantly considering different cases for wide sketching operators or tall sketching operators. This philosophy is stated in the RandLAPACK book; it's also reflected in the RandBLAS API for sparse sketching operators (which adopts a taxonomy of being short-axis-sparse or long-axis-sparse).

1. Swapping a distribution's row and column dimensions should be equivalent to transposition.

Suppose we fix a MajorAxis variable ma and a DenseDistName variable distname. For positive integers (n, k), define distributions

DenseDist D1{n, k, distname, ma}
DenseDist D2{k, n, distname, ma}

and consider a fixed RNGState, state. This PR makes it so that if we sample

DenseSkOp S1(D1, state)
DenseSkOp S2(D2, state)

then S1 == S2.T holds in a mathematical sense. Tests are included to verify this functionality.

2. Short-axis major layout: easily incorporate more data into a sketch of fixed size

2.1. Incorporating new rows when sketching from the left.

Suppose we start with an $m_1 \times n$ matrix $A_1$, a $d \times m_1$ sketching operator $S_1$, and a $d \times n$ sketch $B_1 = S_1 A_1$. Suppose also that we come across a new $m_2 \times n$ matrix $A_2$, and we want a $d \times n$ sketch of the $(m_1 + m_2) \times n$ matrix

A = \begin{bmatrix} A_1 \\ A_2 \end{bmatrix}

obtained by adding rows to $A_1$.

There are two ways to get such a sketch. The naive approach is to throw out $B_1$ and generate a new $d \times (m_1+m_2)$ sketching operator $S$ before computing $B = SA$. A preferred approach is to only generate a new $d \times m_2$ sketching operator independently from $S_1$, implicitly define $S = [S_1, S_2]$, and then implicitly compute $B = B_1 + S_2 A_2$.

It is easy to see that if $S_1$ was column-major, then it should be possible to generate column-major $S_2$ so that $S = [S_1, S_2]$ is the same as if we generated a $d \times (m_1 + m_2)$ column-major sketching operator $S$ from scratch using the same random seed as $S_1$. By contrast, if $S_1$ and $S_2$ were generated in row-major format, then $[S_1, S_2]$ would differ from a $d \times (m_1 + m_2)$ operator generated with the same seed as $S_1$.

2.2. Incorporating new columns when sketching from the right.

Let's start with an $m \times n_1$ matrix $A_1$, an $n_1 \times d$ sketching operator $S_1$, and an $m \times d$ sketch $B_1 = A_1 S_1$. Suppose we come across an $m \times n_2$ matrix $A_2$ and that we want a $m \times d$ sketch of the $m \times (n_1 + n_2)$ matrix

A = \begin{bmatrix} A_1 & A_2 \end{bmatrix}

obtained by adding columns to $A_1$.

As with the case of adding rows considered above, there are two ways to do this. The more efficient way is to generate only a new $n_2 \times d$ sketching operator $S_2$, and then to implicitly compute

B = \begin{bmatrix} A_1 & A_2 \end{bmatrix}\begin{bmatrix} S_1 \\ S_2 \end{bmatrix}

by accumulating $B = B_1 + A_2 S_2$.

In this situation, it is easy to see that generating $S_1$ and $S_2$ in row-major format makes it possible for the augmented matrix $[S_1 ; S_2]$ to be the same as if it was generated from scratch using the same seed as $S_1$.

2.3. A unified perspective.

A sketching operator that is applied from the left can only enact dimension reduction if it is wide. A wide sketching operator $\texttt{S}$ is in column-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.

A sketching operator that is applied from the right can only enact dimension reduction if it is tall. A tall sketching operator $\texttt{S}$ is in row-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.

When considered together, we see that if one's goal is to incorporate new data without increasing the size of the sketch, then having the sketching operator's distribution be short-axis major provides reproducibility if the same data arrives over time in blocks of different sizes.

3. Long-axis major layout: easily increase the size of a sketch

Let's say we have a fixed data matrix $A$ and an initial sketching operator $S_1$.

Suppose we want to enlarge the sketch $S_1 A$ by adding rows to $S_1$. Naturally, if $S_1$ is generated in row-major format, then it is possible to augment $S_1$ with row-major $S_2$ in a way that is equivalent to generating $[S_1; S_2]$ from scratch using the same seed as $S_1$.

Suppose instead that we want to enlarge $A S_1$ by adding columns to $S_1$. In this case, generating $S_1$ in column-major format makes it so that we can generate column-major $S_2$ in a way that is equivalent to generating $[S_1, S_2]$ from scratch.

Taken together, we see that if one wants to allow for extending the sketch later -- without depending on whether the extension happens incrementally or all at once -- then it is preferable to have the sketching operator be long-axis major.