This PR significantly reduces the wall time and memory utilisation of bottom-up reconciliation for both dense and sparse "summing" matrices, $S$. The functionality provided by this package is helpful, but I found the reconcilers impractical to use for the large hierarchical and grouped structures that I encounter in my job, so my plan is to create a few PRs to improve the performance and functionality where possible for this use case of mine in particular.
Note: $P$ refers to the matrix that maps the base forecasts to the bottom level, such that $\tilde{y}_{h} = SP\hat{y}_{h}$. I was originally a bit confused by the notation in this package because as these are linear methods, $P$ would usually represent the projection matrix which contains the leverages and map the base forecasts onto the coherent subspace, which in this case is actually $SP$. I usually see this matrix referred to as $G$ in the literature, but I know Hyndman is doing some work on reconciliation notation so perhaps this is something to be standardised in the future. š
This improvement is achieved through the following modifications to the _get_PW_matrices method for BottomUp and BottomUpSparse:
Removing the slicing using idx_bottom to get the elements of $S$ and set the elements of $P$, instead implementing a more efficient instantiation and allocation of $P$ using the upper diagonal functionality in np.eye and scipy.sparse.eye to correctly set $P$ as the concatenation of a null matrix, $0_{n_{b}, \, n_{a}}$, and identity matrix, $I_{n_{b}}$. This also has the added benefit of avoiding an albeit minimally expensive transpose in the process as we can avoid changing the shape and stride of $P$, but adds the constraint that the indices corresponding to the bottom level nodes in the hierarchical or grouped structure need to be ordered, such that $S$ equals $\begin{bsmallmatrix} A \\ I_{n_{b}} \end{bsmallmatrix}$, which is not an issue as it is the standard formulation anyway.
As the "weight" matrix, $W$, is not needed for point reconciliation using single level approaches and only required for probabilistic reconciliation in this context, avoiding instantiating and allocating $W$ unless an intervals_method is passed in the fit method reduces the wall time and memory utilisation of point reconciliation.
These changes effectively deprecates the idx_bottom parameter for BottomUp and BottomUpSparse, which although used in the following situations, does not change the output assuming $S$ is standard:
A slice of $S$ and $\hat{y}{h}$ are used for the bottom-up reconciliation from the anchor level in middle-out reconciliation using MiddleOut. The slice of $S$ is currently corrected to account for the new bottom level by using np.unique, but now requires an additional left-right flip to have $S$ equal to $`\begin{bsmallmatrix} A \ I{n_{b}} \end{bsmallmatrix}`$ as np.unique does not preserve the order. This operation is extremely fast and takes constant time as it returns a view of $S$, so there is no performance degradation.
As non-negative min trace reconciliation using MinTrace uses the whole bottom level, $P$ is identical regardless.
The benchmarks for _get_PW_matrices across a few popular hierarchical datasets of different sizes are below, measured on a machine with 6/12 x 2.6 GHz (dynamic frequency scaling to 4.5 GHz) cores and 16 GB RAM.
This PR significantly reduces the wall time and memory utilisation of bottom-up reconciliation for both dense and sparse "summing" matrices, $S$. The functionality provided by this package is helpful, but I found the reconcilers impractical to use for the large hierarchical and grouped structures that I encounter in my job, so my plan is to create a few PRs to improve the performance and functionality where possible for this use case of mine in particular.
Note: $P$ refers to the matrix that maps the base forecasts to the bottom level, such that $
\tilde{y}_{h} = SP\hat{y}_{h}
$. I was originally a bit confused by the notation in this package because as these are linear methods, $P$ would usually represent the projection matrix which contains the leverages and map the base forecasts onto the coherent subspace, which in this case is actually $SP$. I usually see this matrix referred to as $G$ in the literature, but I know Hyndman is doing some work on reconciliation notation so perhaps this is something to be standardised in the future. šThis improvement is achieved through the following modifications to the
_get_PW_matrices
method forBottomUp
andBottomUpSparse
:idx_bottom
to get the elements of $S$ and set the elements of $P$, instead implementing a more efficient instantiation and allocation of $P$ using the upper diagonal functionality innp.eye
andscipy.sparse.eye
to correctly set $P$ as the concatenation of a null matrix, $0_{n_{b}, \, n_{a}}
$, and identity matrix, $I_{n_{b}}
$. This also has the added benefit of avoiding an albeit minimally expensive transpose in the process as we can avoid changing the shape and stride of $P$, but adds the constraint that the indices corresponding to the bottom level nodes in the hierarchical or grouped structure need to be ordered, such that $S$ equals $\begin{bsmallmatrix} A \\ I_{n_{b}} \end{bsmallmatrix}
$, which is not an issue as it is the standard formulation anyway.intervals_method
is passed in thefit
method reduces the wall time and memory utilisation of point reconciliation.These changes effectively deprecates the
idx_bottom
parameter forBottomUp
andBottomUpSparse
, which although used in the following situations, does not change the output assuming $S$ is standard:MiddleOut
. The slice of $S$ is currently corrected to account for the new bottom level by usingnp.unique
, but now requires an additional left-right flip to have $S$ equal to $`\begin{bsmallmatrix} A \ I{n_{b}} \end{bsmallmatrix}`$ asnp.unique
does not preserve the order. This operation is extremely fast and takes constant time as it returns a view of $S$, so there is no performance degradation.MinTrace
uses the whole bottom level, $P$ is identical regardless.The benchmarks for
_get_PW_matrices
across a few popular hierarchical datasets of different sizes are below, measured on a machine with 6/12 x 2.6 GHz (dynamic frequency scaling to 4.5 GHz) cores and 16 GB RAM.\textup{dim} \left ( P \right )
$BottomUp
\left( 32 \times 57 \right)
$10.7\, \mu s
$2.18\, \mu s
$79.6 \%
$BottomUp
\left( 150 \times 199 \right)
$47.3\, \mu s
$6.57\, \mu s
$86.1 \%
$BottomUp
\left( 304 \times 555 \right)
$171\, \mu s
$19.6\, \mu s
$88.5 \%
$BottomUp
\left( 30\, 490 \times 42\, 840 \right)
$12.1\, s
$66\, ms
$99.5 \%
$BottomUpSparse
\left( 32 \times 57 \right)
$601\, \mu s
$189\, \mu s
$68.6 \%
$BottomUpSparse
\left( 150 \times 199 \right)
$1.68\, ms
$192\, \mu s
$88.6 \%
$BottomUpSparse
\left( 304 \times 555 \right)
$4.83\, ms
$198\, \mu s
$95.9 \%
$BottomUpSparse
\left( 30\, 490 \times 42\, 840 \right)
$41\, s
$578\, \mu s
$100 \%
$