Closed christophertitchen closed 1 month ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Great work again - I took the liberty to review it. Minor comment in the code.
Apart from that, I'm sure you checked it but it may be worth checking the types of the sparse matrices in the various matrix multiplications (some may be
csc
, otherscsr
) and whether it's better to transpose operations or not. This can often lead to easy speedups.Other than that, I think mostly what is needed are a few unit tests, comparable to the non-sparse MinT tests.
Hi, Olivier! Apologies for the late reply. I saw a post doing the rounds on LinkedIn from Tim Januschowski about your PhD thesis—a big congratulations on completing it! 🎉
Regarding the sparse array types, I believe sparse matrix multiplication internally converts to CSR to use its multiplication handlers, so we would be introducing a small additional overhead by converting a transposed CSR matrix back from a CSC matrix, for example. An alternative would be to investigate BSR matrices, which are amazing, but unfortunately in hierarchical forecasting we tend not to have those dense horizontal substructures that we would be able to exploit because of the nature our summing matrices—and thinking of the sparsity structure in our problems, if we get around to implementing cross-temporal reconciliation, it will probably be even less suited to BSR matrices.
I noticed the lack of tests for sparse MinT too! We should definitely improve them, but I think this work can fall under a different PR. I added some checks for this non-negative piece to what we already have as a stopgap.
Yes, the "legacy" heuristic for determining the tolerance to use in BiCGSTAB has been deprecated, which is why I removed it.
Great work again - I took the liberty to review it. Minor comment in the code. Apart from that, I'm sure you checked it but it may be worth checking the types of the sparse matrices in the various matrix multiplications (some may be
csc
, otherscsr
) and whether it's better to transpose operations or not. This can often lead to easy speedups. Other than that, I think mostly what is needed are a few unit tests, comparable to the non-sparse MinT tests.Hi, Olivier! Apologies for the late reply. I saw a post doing the rounds on LinkedIn from Tim Januschowski about your PhD thesis—a big congratulations on completing it! 🎉
Regarding the sparse array types, I believe sparse matrix multiplication internally converts to CSR to use its multiplication handlers, so we would be introducing a small additional overhead by converting a transposed CSR matrix back from a CSC matrix, for example. An alternative would be to investigate BSR matrices, which are amazing, but unfortunately in hierarchical forecasting we tend not to have those dense horizontal substructures that we would be able to exploit because of the nature our summing matrices—and thinking of the sparsity structure in our problems, if we get around to implementing cross-temporal reconciliation, it will probably be even less suited to BSR matrices.
I noticed the lack of tests for sparse MinT too! We should definitely improve them, but I think this work can fall under a different PR. I added some checks for this non-negative piece to what we already have as a stopgap.
Yes, the "legacy" heuristic for determining the tolerance to use in BiCGSTAB has been deprecated, which is why I removed it.
Thanks - also for the kind words. Makes sense!
We can merge this one after #296 has been approved.
This PR adds a non-negative reconciliation heuristic for sparse minimum trace reconciliation, which currently is not supported in the
fit
method forMinTraceSparse
. I have implemented something similar in my job as the time series we deal with are inherently non-negative in nature, as they are across many industries, and thought it would be useful. It is fast and guarantees non-negative coherence at the expense of producing an approximate solution with potentially ill-defined mathematical properties. However, the goal of such a transformation is to align forecasts with their use in practice for the users of our forecasts, not necessarily improve accuracy, which is a nice additional benefit that in my experience actually happens rather frequently.We can expand this further in the future by implementing a sparse non-negative least squares solver—there seem to be a few nice examples of this in the signal processing space.
Step 1: Clip the base forecasts to remove any negative point forecasts, which of course does not guarantee non-negativity in the minimum trace method, but nonetheless provides a good starting point.
$
\alpha_{h} = \begin{cases} \hat{y}_{i,h} & \text{if } \hat{y}_{i,h} \gt 0 \\ 0 & \text{otherwise} \end{cases}
$Step 2: Although it is now sufficient to ensure that all of the entries in $P$ are positive, as it is implemented as a linear operator, we solve the sparse linear system iteratively as usual using BiCGSTAB. We could also try changing this to MINRES to exploit the symmetry of $
\left( S'W^{-1}_{h}S \right)^{-1}
$ in the future which has been more performant in my experience, but it can be finicky to find the relative tolerance that best balances speed with maximum error.$
\beta_{h} = S\left( S'W^{-1}_{h}S \right)^{-1}S'W^{-1}_{h}\alpha_{h}
$Step 3: We now have reconciled point forecasts which minimise the error variances of the coherent forecasts, but which may be negative, so we need to check if any of the coherent bottom level point forecasts are negative. If so, the bottom level negative forecasts are clipped.
$
b_{h} = \begin{cases} \beta_{j,h} & \text{if } \beta_{j,h} \gt 0 \\ 0 & \text{otherwise} \end{cases}
$$
j = \left\{ n_{a}, n_{a} + 1, \cdots , n_{a} + n_{b} - 1, n_{a} + n_{b} \right\}
$Step 4: Restore coherence by aggregating the non-negative bottom level reconciled forecasts across the data structure, which leads to an approximate solution as we used the leverages of the negative-containing $SP$ in the minimum variance unbiased estimator to generate $
b_{h}
$.$
\tilde{y}_{h}= Sb_{h}
$Step 5: This directly forces projection onto the non-negative coherent subspace, which can be equivalently and subsequently achieved by using the bottom-up reconciliation projection matrix. We also set $W$ appropriately for probabilistic reconciliation if required.
$
\tilde{y}_{h} = S\begin{bsmallmatrix} 0_{n_{b}, n_{a}} \quad I_{n_{b}} \end{bsmallmatrix} \tilde{y}_{h}
$$
W = I_{n}
$