Nixtla / hierarchicalforecast

Probabilistic Hierarchical forecasting 👑 with statistical and econometric methods.
https://nixtlaverse.nixtla.io/hierarchicalforecast
Apache License 2.0
541 stars 65 forks source link

TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1 #253

Open jmberutich opened 7 months ago

jmberutich commented 7 months ago

What happened + What you expected to happen

The TopDown methods:

Are broken after version 0.2.1.

Error output:

KeyError                                  Traceback (most recent call last)
File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
   3789 try:
-> 3790     return self._engine.get_loc(casted_key)
   3791 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'AutoARIMA'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 40
     33 reconcilers = [
     34     BottomUp(),
     35     TopDown(method='average_proportions'),
     36     MiddleOut(middle_level='Country/Purpose/State',
     37               top_down_method='forecast_proportions')
     38 ]
     39 hrec = HierarchicalReconciliation(reconcilers=reconcilers)
---> 40 Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
     41                           S=S, tags=tags)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/hierarchicalforecast/core.py:280, in HierarchicalReconciliation.reconcile(self, Y_hat_df, S, tags, Y_df, level, intervals_method, num_samples, seed, sort_df, is_balanced)
    278         y_hat_insample = Y_df[model_name].values.reshape(len(S_df), -1).astype(np.float32)
    279     else:
--> 280         y_hat_insample = Y_df.pivot(columns='ds', values=model_name).loc[S_df.index].values.astype(np.float32)
    281     reconciler_args['y_hat_insample'] = y_hat_insample
    283 if has_level and (level is not None):

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:9025, in DataFrame.pivot(self, columns, index, values)
   9018 @Substitution("")
   9019 @Appender(_shared_docs["pivot"])
   9020 def pivot(
   9021     self, *, columns, index=lib.no_default, values=lib.no_default
   9022 ) -> DataFrame:
   9023     from pandas.core.reshape.pivot import pivot
-> 9025     return pivot(self, index=index, columns=columns, values=values)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/reshape/pivot.py:549, in pivot(data, columns, index, values)
    545         indexed = data._constructor(
    546             data[values]._values, index=multiindex, columns=values
    547         )
    548     else:
--> 549         indexed = data._constructor_sliced(data[values]._values, index=multiindex)
    550 # error: Argument 1 to "unstack" of "DataFrame" has incompatible type "Union
    551 # [List[Any], ExtensionArray, ndarray[Any, Any], Index, Series]"; expected
    552 # "Hashable"
    553 result = indexed.unstack(columns_listlike)  # type: ignore[arg-type]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:3893, in DataFrame.__getitem__(self, key)
   3891 if self.columns.nlevels > 1:
   3892     return self._getitem_multilevel(key)
-> 3893 indexer = self.columns.get_loc(key)
   3894 if is_integer(indexer):
   3895     indexer = [indexer]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
   3792     if isinstance(casted_key, slice) or (
   3793         isinstance(casted_key, abc.Iterable)
   3794         and any(isinstance(x, slice) for x in casted_key)
   3795     ):
   3796         raise InvalidIndexError(key)
-> 3797     raise KeyError(key) from err
   3798 except TypeError:
   3799     # If we have a listlike key, _check_indexing_error will raise
   3800     #  InvalidIndexError. Otherwise we fall through and re-raise
   3801     #  the TypeError.
   3802     self._check_indexing_error(key)

KeyError: 'AutoARIMA'

Versions / Dependencies

v.0.3.0 v0.4.0 v0.4.1

Reproduction script

# !pip install -U numba statsforecast datasetsforecast
import numpy as np
import pandas as pd

#obtain hierarchical dataset
from datasetsforecast.hierarchical import HierarchicalData

# compute base forecast no coherent
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.evaluation import HierarchicalEvaluation
from hierarchicalforecast.methods import BottomUp, TopDown, MiddleOut

# Load TourismSmall dataset
Y_df, S, tags = HierarchicalData.load('./data', 'TourismSmall')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

#split train/test sets
Y_test_df  = Y_df.groupby('unique_id').tail(4)
Y_train_df = Y_df.drop(Y_test_df.index)

# Compute base auto-ARIMA predictions
fcst = StatsForecast(df=Y_train_df,
                     models=[AutoARIMA(season_length=4), Naive()],
                     freq='Q', n_jobs=-1)
Y_hat_df = fcst.forecast(h=4)

# Reconcile the base predictions
reconcilers = [
    BottomUp(),
    TopDown(method='average_proportions'),
    MiddleOut(middle_level='Country/Purpose/State',
              top_down_method='forecast_proportions')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
                          S=S, tags=tags)

Issue Severity

None

mjsandoval04 commented 7 months ago

I have the same issue but in my case, it returns NaN for both TopDown_method-average_proportions and TopDown_method-proportion_averages. it returns values for bottom-up and MinTrace_method. I have tried with previous release versions starting from 0.2.0 and got the same results for my data. please advise!

jmoralez commented 7 months ago

Hey. The TopDown method requires the in-sample predictions of the models to be provided in Y_df, so if you add the following to your example it should work:

Y_hat_df = fcst.forecast(h=4, fitted=True)  # added fitted=True here
insample_df = fcst.forecast_fitted_values()  # get in-sample predictions
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S, tags=tags)  # provide insample_df through Y_df
jmberutich commented 7 months ago

Thanks for the quick response. I followed your suggestion and adding the model insample predictions worked.

I have some questions regarding the TopDown (average_proportions and proportion_averages) methods are how the are calculated.

For forecast_proportions would it not make sense to use the out of sample predictions?

mjsandoval04 commented 7 months ago

@jmoralez thanks for the response, in my case, my code was correct I provided the in-sample predictions in Y_df but still the TopDown results are "NaN". it works for the BottomUp and other methods that I tested like "OptimalCombination" and "MinTrace". So it is strange that is returning NaN for the TopDown method.

any recommendation, please ? below is a snippet of my code:

image

AzulGarza commented 7 months ago

hey @jmberutich, regarding your questions on the methods:

nelsoncardenas commented 6 months ago

So, the solution would be

Y_hat_df = fcst.forecast(h=group.horizon, fitted=True)
insample_df = fcst.forecast_fitted_values()
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)

option B

insample_df = Y_train_df.copy()
insample_df["AutoARIMA"] = insample_df["y"]
insample_df["Naive"] = insample_df["y"]
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)