Nixtla / hierarchicalforecast

Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods.
https://nixtlaverse.nixtla.io/hierarchicalforecast
Apache License 2.0
588 stars 76 forks source link

ENH: Add sparse top-down reconciliation via ``TopDownSparse`` #277

Closed christophertitchen closed 3 months ago

christophertitchen commented 4 months ago

This PR adds a top-down reconciliation class called TopDownSparse for sparse "summing" matrices, $S$, to significantly reduce the wall time for reconciling large hierarchical structures using disaggregation proportions, which now take $98 \%$ to $100 \%$ less time to reconcile.

TopDownSparse uses the _get_PW_matrices method to efficiently instantiate and allocate $P$ as a COO matrix and then convert it to a CSR matrix, which is much faster than using a LIL matrix for constructing a large sparse matrix. Although the sparsity structure of $P$ for top-down reconciliation means that a CSC matrix would outperform a CSR matrix for $SP\hat{y}_{h}$, it is not worth converting $S$ to a CSC matrix. As with #276, the "weight" matrix, $W$, is only instantiated and allocated if performing probabilistic reconciliation, not for scenarios where only point reconciliation is required.

The same tests for TopDown in nbs/methods.ipynb are now evaluated on TopDownSparse as well, and the class supports all of the disaggregation methods of TopDown. This required a small change in _get_child_nodes to check for the presence of a sparse matrix if disaggregating using forecast proportions.

Reference Method method Formula
Gross and Sohl, 1990 A average_proportions $$pj=\frac{1}{T} \sum{t=1}^T \frac{y_{j, t}}{y_t}$$
Gross and Sohl, 1990 F proportion_averages $$pj=\sum{t=1}^T \frac{y{j, t}}{T} / \sum{t=1}^T \frac{y_t}{T}$$
Athanasopoulos, Ahmed, and Hyndman, 2009 FP forecast_proportions | $$p_j=\prod_{\ell=0}^{K-1} \frac{\hat{y}_{j, h}^{(\ell)}}{\hat{S}_{j, h}^{(\ell+1)}}$$

The benchmarks for fit_predict across a few large hierarchical datasets are below, measured on a machine with 6/12 x 2.6 GHz (dynamic frequency scaling to 4.5 GHz) cores and 16 GB RAM.

Note: $P$ is only used for methods A and F, not FP, so there will be no performance gain with TopDownSparse when disaggregating using forecast proportions, only historical proportions. Although the private and M5 datasets are not strictly hierarchical, this is not an issue when disaggregating using methods A and F, because the proportions do not depend on the disaggregation pathways from the top node to the bottom nodes. However, this is not the case with method FP, which is invalid for these structures as there are multiple $p_j$ for each $j$, depending on the respective disaggregation pathway. There are practical approaches to deal with this while accepting further information loss, like creating a synthetic disaggregation pathway directly from the top node to the bottom nodes, such that $p_j=\hat{y}_{j, h} / \hat{y}_{h}$, selecting a single disaggregation pathway for each $j$, or averaging them, but that is beyond the scope of this PR. Also, note that the private dataset is quite interesting in the sense that the number of aggregated time series is greater than the number of bottom level time series, which is not usually the case for practical problems. This is not an issue here, but becomes more consequential in min trace reconciliation as inverting $S'W^{-1}S$ in the structural representation becomes more efficient than inverting $CWC'$ in the zero-constrained representation as $n{a} > n{b}$, not that min trace reconciliation scales particularly well on large hierarchical structures anyway, apart from perhaps structural scaling.

Dataset $\textup{dim} \left ( P \right )$ $K$ $T$ $h$ Method TopDown TopDownSparse Reduction
Private $\small{\left( 22\, 705 \times 96\, 249 \right)}$ $45$ $36$ $6$ A $59.5\, s$ $14.3\, ms$ $100 \%$
Private $\small{\left( 22\, 705 \times 96\, 249 \right)}$ $45$ $36$ $6$ F $59.1\, s$ $12.6\, ms$ $100 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $28$ A $34.1\, s$ $590\, ms$ $98.3 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $28$ F $33.9\, s$ $307\, ms$ $99.1 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $7$ A $33.8\, s$ $582\, ms$ $98.3 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $7$ F $33.7\, s$ $298\, ms$ $99.1 \%$
Tourism $\small{\left( 304 \times 555 \right)}$ $8$ $228$ $12$ A $2.09\, ms$ $1.91\, ms$ $8.6 \%$
Tourism $\small{\left( 304 \times 555 \right)}$ $8$ $228$ $12$ F $1.41\, ms$ $823\, \mu s$ $41.6 \%$
Tourism n/a $8$ $228$ $12$ FP $172\, ms$ $173\, ms$ $-0.6 \%$
review-notebook-app[bot] commented 4 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

elephaint commented 4 months ago

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

christophertitchen commented 3 months ago

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

Thanks, Olivier! I just added a test to check for equivalence between TopDown and TopDownSparse to within to the machine epsilon of a double, $2^{-52}$.

christophertitchen commented 3 months ago

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

Hi, @elephaint. Can you think of any more changes or is it OK to be merged now?

elephaint commented 3 months ago

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

Hi, @elephaint. Can you think of any more changes or is it OK to be merged now?

Thanks - sorry for the late reply. Great work! I have a small comment, I think other than that it's good to go.

Thanks also for adding the output equivalence test!