Closed nulam closed 3 months ago
Eventually I managed to achieve a respectable performance through
def _get_PW_matrices(self, S, idx_bottom):
n_hiers, n_bottom = S.shape
P = sparse.lil_matrix(S.shape, dtype=np.float32)
rows = np.array(idx_bottom)
cols = np.arange(n_bottom)
data = np.ones_like(rows, dtype=np.float32)
P = sparse.csr_matrix((data, (rows, cols)), shape=(n_hiers, n_bottom)).T
W = sparse.identity(n_hiers, dtype=np.float32)
return P, W
However I noticed a note about possibly deprecating the idx_bottom
argument in the future, should I open a PR?
@nulam Thanks for using this! Why don't you open a PR, I haven't worked on this recently but I'll have a look (I now got my hands on 200k dataset as well), and then hopefully someone from nixtla finds the time to check if they have any problem with idx_bottom or anything else.
hey @nulam and @mcsqr! currently, the efforts to remove the idx_bottom
are paused. would you like to open a PR with your improvements, @nulam?❤️
Solved by #276
What happened + What you expected to happen
Current implementation of
BottomUpSparse
reconciliation method causes an undesired conversion to a dense matrix, potentially leading to an OOM on large datasets. The lack of speedup was already observed before as noted in makoren's comment in the class definition.The culprit is the
idx_bottom
indexed assignment: https://github.com/Nixtla/hierarchicalforecast/blob/2296c259542dbd906cfba4b8345c3b72148dad79/hierarchicalforecast/methods.py#L249Locally I managed to fix it by replacing this line with the following
However, the performance is quite suboptimal, but it's sufficient for my use case, hence I would rather raise this issue than create a PR :)
Versions / Dependencies
hierarchicalforecast==0.4.1
Reproduction script
using an S matrix with n>200k leads to OOM with BottomUpSparse reconciliation
Issue Severity
Medium: It is a significant difficulty but I can work around it.