Closed elephaint closed 2 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@christophertitchen Again thanks for the remarks, I made a few other micro optimizations, in the end in total further reducing computation time by ~30%
I am happy the comments were helpful!
The only further change I thought of would be to potentially modify the naΓ―ve approach when computing covariance and correlation to a pairwise approach for each time series, like what is done in
stats::cov
for R. This could better describe the relationship between pairs of time series, but honestly, it would slow down the generation of the shrunk W and go against the spirit of what you were trying to achieve here by implementing a fast method, so I think it is perfectly fine as you have done it.As a quick example to demonstrate, imagine monthly time series representing the units sold of products over the past six months. We take two of those products, X and Y , with Y being a NPI (new product introduction) which was implemented only four months prior. The residuals of a naΓ―ve forecast would be:
X = [ ( x t β 6 β x t β 7 ) , ( x t β 5 β x t β 6 ) , ( x t β 4 β x t β 5 ) , ( x t β 3 β x t β 4 ) , ( x t β 2 β x t β 3 ) , ( x t β 1 β x t β 2 ) ]
Y = [ β , β , β , ( y t β 3 β y t β 4 ) , ( y t β 2 β y t β 3 ) , ( y t β 1 β y t β 2 ) ]
This approach would use an
n_samples
of 6 and the the mean of X would include all six observations. However, a pairwise approach would only use pairs of temporally aligned observations, so ann_samples
of 3 , and the mean of X would be X β = ( ( x t β 3 β x t β 4 ) + ( x t β 2 β x t β 3 ) + ( x t β 1 β x t β 2 ) ) / 3 , which would obviously lead to a different cov ( X , Y ) , corr ( X , Y ) , Ο x for standardisation and so on.not_mask = ~np.isnan(residuals)
... for i in prange(n_timeseries): x = residuals[i] for j in range(i + 1): y = residuals[j] # Get the temporally aligned pairs of observations. mask = not_mask[i] & not_mask[j] n_samples = len(mask) # Check for insufficient observations for unbiased estimator. if n_samples - 1 == 0: ... x = x[mask] y = y[mask] # Calculate the pairwise sample means. mean_x = np.mean(x) mean_y = np.mean(y) ...
Anyway, I am not familiar with any literature on this topic or if there is any evidence to suggest it is significantly worse, and the legacy version also takes the same approach, so as I said, I favour the fast approach you have implemented here. Good job!
Didn't think about this, but it's a good point. I'll have a try and play around a bit with it :)
@christophertitchen Thanks for the suggestion for the temporal alignment - I included two versions, one that can handle NaNs in the way you describe and a faster one that doesn't have to deal with NaNs. The NaN-version is a bit slower (a.o. due to the masking involved) but it enlarges the scope of MinT significantly, as it can handle much more cases without leading to ill-conditioned W
matrices (see example in the top post). Thanks!
@christophertitchen Again thanks for the thoughtful comments.
I think I'll be deprecating the legacy version now that I get equivalent results across datasets.
@christophertitchen Again thanks for the thoughtful comments.
I think I'll be deprecating the legacy version now that I get equivalent results across datasets.
@elephaint great job!
Also, regarding Hyndman's implementation, you were right.
https://github.com/earowang/hts/blob/1408ab2c5c40f1022e6957bcf8438aaefc8464bf/R/MinT.R#L24
covm <- crossprod(x) / n
v <- (1/(n * (n - 1))) * (crossprod(xs^2) - 1/n * (crossprod(xs))^2)
It uses the biased estimator, i.e. normalisation by $N$. We could have changed ddof
in np.ma.cov
to $0$ to copy this approach, but deprecating the legacy approach in favour of your superior one makes sense. π
This PR implements an optimized version of the Schafer-Strimmer shrunk empirical covariance algorithm for
MinT
.1. Up to 60x faster
First, the result is that we can perform
MinT
reconciliation much faster, as demonstrated in the table below, wheremint_legacy
is the old version andmint
is the new version, with the number in brackets denoting whether the forecasts contain NaNs (True or False) and the number of bottom-level timeseries reconciled (i.e.,20
,200
,2000
). An improvement of up to 60x (on my machine, i7-12700k) is seen (no NaNs, 2000 timeseries).Note that the NaN-version is about 3x slower than the version that doesn't have to deal with NaNs, which is due to all the masking involved when dealing with NaNs. However, it is still significantly faster than the legacy version (that doesn't handle NaNs properly).
2. Forecasting performance
Forecasting performance is similar:
3. Improved NaN handling
This PR improves NaN-handling (thanks to @christophertitchen for the suggestions!), leading to more cases that MinT can handle. For example, introducing 20% NaN values in the Australian Tourism forecasts couldn't be handled by the old version but the new version will give results nearly identical to the no-nan case:
Code for introducing NaNs in the Australian Tourism example: