Nixtla / hierarchicalforecast

Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods.
https://nixtlaverse.nixtla.io/hierarchicalforecast
Apache License 2.0
588 stars 76 forks source link

[Methods] Undesired conversion to a dense matrix #261

Closed nulam closed 3 months ago

nulam commented 9 months ago

What happened + What you expected to happen

Current implementation of BottomUpSparse reconciliation method causes an undesired conversion to a dense matrix, potentially leading to an OOM on large datasets. The lack of speedup was already observed before as noted in makoren's comment in the class definition.

The culprit is the idx_bottom indexed assignment: https://github.com/Nixtla/hierarchicalforecast/blob/2296c259542dbd906cfba4b8345c3b72148dad79/hierarchicalforecast/methods.py#L249

Locally I managed to fix it by replacing this line with the following

for idx in idx_bottom:
    P[idx] = S.getrow(idx)

However, the performance is quite suboptimal, but it's sufficient for my use case, hence I would rather raise this issue than create a PR :)

Versions / Dependencies

hierarchicalforecast==0.4.1

Reproduction script

using an S matrix with n>200k leads to OOM with BottomUpSparse reconciliation

Issue Severity

Medium: It is a significant difficulty but I can work around it.

nulam commented 9 months ago

Eventually I managed to achieve a respectable performance through

    def _get_PW_matrices(self, S, idx_bottom):
        n_hiers, n_bottom = S.shape
        P = sparse.lil_matrix(S.shape, dtype=np.float32)
        rows = np.array(idx_bottom)
        cols = np.arange(n_bottom)
        data = np.ones_like(rows, dtype=np.float32)
        P = sparse.csr_matrix((data, (rows, cols)), shape=(n_hiers, n_bottom)).T
        W = sparse.identity(n_hiers, dtype=np.float32)
        return P, W

However I noticed a note about possibly deprecating the idx_bottom argument in the future, should I open a PR?

mcsqr commented 9 months ago

@nulam Thanks for using this! Why don't you open a PR, I haven't worked on this recently but I'll have a look (I now got my hands on 200k dataset as well), and then hopefully someone from nixtla finds the time to check if they have any problem with idx_bottom or anything else.

AzulGarza commented 9 months ago

hey @nulam and @mcsqr! currently, the efforts to remove the idx_bottom are paused. would you like to open a PR with your improvements, @nulam?❤️

nulam commented 3 months ago

Solved by #276