This PR extends the capabilities of DenseSkOp objects. It also sets new semantics for DenseDist and DenseSkOp objects.
1. New capabilities.
The layout (row-major or column-major) of DenseSkOp.buff can now be different from the layout of the input data matrix and the output sketch.
2. Breaking API changes.
DenseDist structs dist now have a new field called dist.major_axis. This field affects whether sketching operators following dist are represented by buffers in row-major or column-major order. The options for this field are MajorAxis::Long and MajorAxis::Short.
The constructors for DenseSkOp objects no longer allow a layout parameter. Instead, the layout is determined automatically from the dist argument, as per the table below.
Is the sketching operator wide or tall?
short- or long-axis major?
Implied storage order
wide
MajorAxis::Long
row-major
wide
MajorAxis::Short
column-major
tall
MajorAxis::Long
column-major
tall
MajorAxis::Short
row-major
3. Incidental changes
I restructured our test code to make things more modular.
Why bother?
I claim that framing DenseSkOp objects as being "long axis-major" or "short axis-major" is more useful than framing discussion in terms of row or column-major. The high-level reason for this is to specify RandBLAS' semantics without constantly considering different cases for wide sketching operators or tall sketching operators. This philosophy is stated in the RandLAPACK book; it's also reflected in the RandBLAS API for sparse sketching operators (which adopts a taxonomy of being short-axis-sparse or long-axis-sparse).
1. Swapping a distribution's row and column dimensions should be equivalent to transposition.
Suppose we fix a MajorAxis variable ma and a DenseDistName variable distname. For positive integers (n, k), define distributions
DenseDist D1{n, k, distname, ma}
DenseDist D2{k, n, distname, ma}
and consider a fixed RNGState, state. This PR makes it so that if we sample
DenseSkOp S1(D1, state)
DenseSkOp S2(D2, state)
then S1 == S2.T holds in a mathematical sense. Tests are included to verify this functionality.
2. Short-axis major layout: easily incorporate more data into a sketch of fixed size
2.1. Incorporating new rows when sketching from the left.
Suppose we start with an $m_1 \times n$ matrix $A_1$, a $d \times m_1$ sketching operator $S_1$, and a $d \times n$ sketch $B_1 = S_1 A_1$. Suppose also that we come across a new $m_2 \times n$ matrix $A_2$, and we want a $d \times n$ sketch of the $(m_1 + m_2) \times n$ matrix
A = \begin{bmatrix} A_1 \\ A_2 \end{bmatrix}
obtained by adding rows to $A_1$.
There are two ways to get such a sketch. The naive approach is to throw out $B_1$ and generate a new $d \times (m_1+m_2)$ sketching operator $S$ before computing $B = SA$. A preferred approach is to only generate a new $d \times m_2$ sketching operator independently from $S_1$, implicitly define $S = [S_1, S_2]$, and then implicitly compute $B = B_1 + S_2 A_2$.
It is easy to see that if $S_1$ was column-major, then it should be possible to generate column-major $S_2$ so that $S = [S_1, S_2]$ is the same as if we generated a $d \times (m_1 + m_2)$ column-major sketching operator $S$ from scratch using the same random seed as $S_1$. By contrast, if $S_1$ and $S_2$ were generated in row-major format, then $[S_1, S_2]$ would differ from a $d \times (m_1 + m_2)$ operator generated with the same seed as $S_1$.
2.2. Incorporating new columns when sketching from the right.
Let's start with an $m \times n_1$ matrix $A_1$, an $n_1 \times d$ sketching operator $S_1$, and an $m \times d$ sketch $B_1 = A_1 S_1$. Suppose we come across an $m \times n_2$ matrix $A_2$ and that we want a $m \times d$ sketch of the $m \times (n_1 + n_2)$ matrix
A = \begin{bmatrix} A_1 & A_2 \end{bmatrix}
obtained by adding columns to $A_1$.
As with the case of adding rows considered above, there are two ways to do this. The more efficient way is to generate only a new $n_2 \times d$ sketching operator $S_2$, and then to implicitly compute
In this situation, it is easy to see that generating $S_1$ and $S_2$ in row-major format makes it possible for the augmented matrix $[S_1 ; S_2]$ to be the same as if it was generated from scratch using the same seed as $S_1$.
2.3. A unified perspective.
A sketching operator that is applied from the left can only enact dimension reduction if it is wide. A wide sketching operator $\texttt{S}$ is in column-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.
A sketching operator that is applied from the right can only enact dimension reduction if it is tall. A tall sketching operator $\texttt{S}$ is in row-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.
When considered together, we see that if one's goal is to incorporate new data without increasing the size of the sketch, then having the sketching operator's distribution be short-axis major provides reproducibility if the same data arrives over time in blocks of different sizes.
3. Long-axis major layout: easily increase the size of a sketch
Let's say we have a fixed data matrix $A$ and an initial sketching operator $S_1$.
Suppose we want to enlarge the sketch $S_1 A$ by adding rows to $S_1$. Naturally, if $S_1$ is generated in row-major format, then it is possible to augment $S_1$ with row-major $S_2$ in a way that is equivalent to generating $[S_1; S_2]$ from scratch using the same seed as $S_1$.
Suppose instead that we want to enlarge $A S_1$ by adding columns to $S_1$. In this case, generating $S_1$ in column-major format makes it so that we can generate column-major $S_2$ in a way that is equivalent to generating $[S_1, S_2]$ from scratch.
Taken together, we see that if one wants to allow for extending the sketch later -- without depending on whether the extension happens incrementally or all at once -- then it is preferable to have the sketching operator be long-axis major.
What's in this PR?
This PR extends the capabilities of DenseSkOp objects. It also sets new semantics for DenseDist and DenseSkOp objects.
1. New capabilities.
The layout (row-major or column-major) of
DenseSkOp.buff
can now be different from the layout of the input data matrix and the output sketch.2. Breaking API changes.
DenseDist
structsdist
now have a new field calleddist.major_axis
. This field affects whether sketching operators followingdist
are represented by buffers in row-major or column-major order. The options for this field areMajorAxis::Long
andMajorAxis::Short
.The constructors for DenseSkOp objects no longer allow a
layout
parameter. Instead, the layout is determined automatically from thedist
argument, as per the table below.3. Incidental changes
I restructured our test code to make things more modular.
Why bother?
I claim that framing
DenseSkOp
objects as being "long axis-major" or "short axis-major" is more useful than framing discussion in terms of row or column-major. The high-level reason for this is to specify RandBLAS' semantics without constantly considering different cases for wide sketching operators or tall sketching operators. This philosophy is stated in the RandLAPACK book; it's also reflected in the RandBLAS API for sparse sketching operators (which adopts a taxonomy of being short-axis-sparse or long-axis-sparse).1. Swapping a distribution's row and column dimensions should be equivalent to transposition.
Suppose we fix a MajorAxis variable
ma
and a DenseDistName variabledistname
. For positive integers(n, k)
, define distributionsand consider a fixed RNGState,
state
. This PR makes it so that if we samplethen
S1 == S2.T
holds in a mathematical sense. Tests are included to verify this functionality.2. Short-axis major layout: easily incorporate more data into a sketch of fixed size
2.1. Incorporating new rows when sketching from the left.
Suppose we start with an $m_1 \times n$ matrix $A_1$, a $d \times m_1$ sketching operator $S_1$, and a $d \times n$ sketch $B_1 = S_1 A_1$. Suppose also that we come across a new $m_2 \times n$ matrix $A_2$, and we want a $d \times n$ sketch of the $(m_1 + m_2) \times n$ matrix
obtained by adding rows to $A_1$.
There are two ways to get such a sketch. The naive approach is to throw out $B_1$ and generate a new $d \times (m_1+m_2)$ sketching operator $S$ before computing $B = SA$. A preferred approach is to only generate a new $d \times m_2$ sketching operator independently from $S_1$, implicitly define $S = [S_1, S_2]$, and then implicitly compute $B = B_1 + S_2 A_2$.
It is easy to see that if $S_1$ was column-major, then it should be possible to generate column-major $S_2$ so that $S = [S_1, S_2]$ is the same as if we generated a $d \times (m_1 + m_2)$ column-major sketching operator $S$ from scratch using the same random seed as $S_1$. By contrast, if $S_1$ and $S_2$ were generated in row-major format, then $[S_1, S_2]$ would differ from a $d \times (m_1 + m_2)$ operator generated with the same seed as $S_1$.
2.2. Incorporating new columns when sketching from the right.
Let's start with an $m \times n_1$ matrix $A_1$, an $n_1 \times d$ sketching operator $S_1$, and an $m \times d$ sketch $B_1 = A_1 S_1$. Suppose we come across an $m \times n_2$ matrix $A_2$ and that we want a $m \times d$ sketch of the $m \times (n_1 + n_2)$ matrix
obtained by adding columns to $A_1$.
As with the case of adding rows considered above, there are two ways to do this. The more efficient way is to generate only a new $n_2 \times d$ sketching operator $S_2$, and then to implicitly compute
by accumulating $B = B_1 + A_2 S_2$.
In this situation, it is easy to see that generating $S_1$ and $S_2$ in row-major format makes it possible for the augmented matrix $[S_1 ; S_2]$ to be the same as if it was generated from scratch using the same seed as $S_1$.
2.3. A unified perspective.
A sketching operator that is applied from the left can only enact dimension reduction if it is wide. A wide sketching operator $\texttt{S}$ is in column-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.
A sketching operator that is applied from the right can only enact dimension reduction if it is tall. A tall sketching operator $\texttt{S}$ is in row-major format if $\texttt{S.dist.major}\_\texttt{axis}=\texttt{MajorAxis::Short}$.
When considered together, we see that if one's goal is to incorporate new data without increasing the size of the sketch, then having the sketching operator's distribution be short-axis major provides reproducibility if the same data arrives over time in blocks of different sizes.
3. Long-axis major layout: easily increase the size of a sketch
Let's say we have a fixed data matrix $A$ and an initial sketching operator $S_1$.
Suppose we want to enlarge the sketch $S_1 A$ by adding rows to $S_1$. Naturally, if $S_1$ is generated in row-major format, then it is possible to augment $S_1$ with row-major $S_2$ in a way that is equivalent to generating $[S_1; S_2]$ from scratch using the same seed as $S_1$.
Suppose instead that we want to enlarge $A S_1$ by adding columns to $S_1$. In this case, generating $S_1$ in column-major format makes it so that we can generate column-major $S_2$ in a way that is equivalent to generating $[S_1, S_2]$ from scratch.
Taken together, we see that if one wants to allow for extending the sketch later -- without depending on whether the extension happens incrementally or all at once -- then it is preferable to have the sketching operator be long-axis major.