Closed christophertitchen closed 3 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?
Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?
Thanks, Olivier! I just added a test to check for equivalence between TopDown
and TopDownSparse
to within to the machine epsilon of a double, $2^{-52}$.
Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?
Hi, @elephaint. Can you think of any more changes or is it OK to be merged now?
Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?
Hi, @elephaint. Can you think of any more changes or is it OK to be merged now?
Thanks - sorry for the late reply. Great work! I have a small comment, I think other than that it's good to go.
Thanks also for adding the output equivalence test!
This PR adds a top-down reconciliation class called
TopDownSparse
for sparse "summing" matrices, $S$, to significantly reduce the wall time for reconciling large hierarchical structures using disaggregation proportions, which now take $98 \%
$ to $100 \%
$ less time to reconcile.TopDownSparse
uses the_get_PW_matrices
method to efficiently instantiate and allocate $P$ as a COO matrix and then convert it to a CSR matrix, which is much faster than using a LIL matrix for constructing a large sparse matrix. Although the sparsity structure of $P$ for top-down reconciliation means that a CSC matrix would outperform a CSR matrix for $SP\hat{y}_{h}
$, it is not worth converting $S$ to a CSC matrix. As with #276, the "weight" matrix, $W$, is only instantiated and allocated if performing probabilistic reconciliation, not for scenarios where only point reconciliation is required.The same tests for
TopDown
innbs/methods.ipynb
are now evaluated onTopDownSparse
as well, and the class supports all of the disaggregation methods ofTopDown
. This required a small change in_get_child_nodes
to check for the presence of a sparse matrix if disaggregating using forecast proportions.method
average_proportions
proportion_averages
forecast_proportions
| $$p_j=\prod_{\ell=0}^{K-1} \frac{\hat{y}_{j, h}^{(\ell)}}{\hat{S}_{j, h}^{(\ell+1)}}$
$The benchmarks for
fit_predict
across a few large hierarchical datasets are below, measured on a machine with 6/12 x 2.6 GHz (dynamic frequency scaling to 4.5 GHz) cores and 16 GB RAM.Note: $P$ is only used for methods A and F, not FP, so there will be no performance gain with
TopDownSparse
when disaggregating using forecast proportions, only historical proportions. Although the private and M5 datasets are not strictly hierarchical, this is not an issue when disaggregating using methods A and F, because the proportions do not depend on the disaggregation pathways from the top node to the bottom nodes. However, this is not the case with method FP, which is invalid for these structures as there are multiple $p_j$ for each $j$, depending on the respective disaggregation pathway. There are practical approaches to deal with this while accepting further information loss, like creating a synthetic disaggregation pathway directly from the top node to the bottom nodes, such that $p_j=\hat{y}_{j, h} / \hat{y}_{h}
$, selecting a single disaggregation pathway for each $j$, or averaging them, but that is beyond the scope of this PR. Also, note that the private dataset is quite interesting in the sense that the number of aggregated time series is greater than the number of bottom level time series, which is not usually the case for practical problems. This is not an issue here, but becomes more consequential in min trace reconciliation as inverting $S'W^{-1}S$ in the structural representation becomes more efficient than inverting $CWC'$ in the zero-constrained representation as $n{a} > n{b}$, not that min trace reconciliation scales particularly well on large hierarchical structures anyway, apart from perhaps structural scaling.\textup{dim} \left ( P \right )
$TopDown
TopDownSparse
\small{\left( 22\, 705 \times 96\, 249 \right)}
$59.5\, s
$14.3\, ms
$100 \%
$\small{\left( 22\, 705 \times 96\, 249 \right)}
$59.1\, s
$12.6\, ms
$100 \%
$\small{\left( 30\, 490 \times 42\, 840 \right)}
$1\, 969
$34.1\, s
$590\, ms
$98.3 \%
$\small{\left( 30\, 490 \times 42\, 840 \right)}
$1\, 969
$33.9\, s
$307\, ms
$99.1 \%
$\small{\left( 30\, 490 \times 42\, 840 \right)}
$1\, 969
$33.8\, s
$582\, ms
$98.3 \%
$\small{\left( 30\, 490 \times 42\, 840 \right)}
$1\, 969
$33.7\, s
$298\, ms
$99.1 \%
$\small{\left( 304 \times 555 \right)}
$2.09\, ms
$1.91\, ms
$8.6 \%
$\small{\left( 304 \times 555 \right)}
$1.41\, ms
$823\, \mu s
$41.6 \%
$172\, ms
$173\, ms
$-0.6 \%
$