TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1

jmberutich commented 1 year ago

What happened + What you expected to happen

The TopDown methods:

proportion_averages
average_proportions

Are broken after version 0.2.1.

Error output:

KeyError                                  Traceback (most recent call last)
File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
   3789 try:
-> 3790     return self._engine.get_loc(casted_key)
   3791 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'AutoARIMA'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 40
     33 reconcilers = [
     34     BottomUp(),
     35     TopDown(method='average_proportions'),
     36     MiddleOut(middle_level='Country/Purpose/State',
     37               top_down_method='forecast_proportions')
     38 ]
     39 hrec = HierarchicalReconciliation(reconcilers=reconcilers)
---> 40 Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
     41                           S=S, tags=tags)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/hierarchicalforecast/core.py:280, in HierarchicalReconciliation.reconcile(self, Y_hat_df, S, tags, Y_df, level, intervals_method, num_samples, seed, sort_df, is_balanced)
    278         y_hat_insample = Y_df[model_name].values.reshape(len(S_df), -1).astype(np.float32)
    279     else:
--> 280         y_hat_insample = Y_df.pivot(columns='ds', values=model_name).loc[S_df.index].values.astype(np.float32)
    281     reconciler_args['y_hat_insample'] = y_hat_insample
    283 if has_level and (level is not None):

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:9025, in DataFrame.pivot(self, columns, index, values)
   9018 @Substitution("")
   9019 @Appender(_shared_docs["pivot"])
   9020 def pivot(
   9021     self, *, columns, index=lib.no_default, values=lib.no_default
   9022 ) -> DataFrame:
   9023     from pandas.core.reshape.pivot import pivot
-> 9025     return pivot(self, index=index, columns=columns, values=values)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/reshape/pivot.py:549, in pivot(data, columns, index, values)
    545         indexed = data._constructor(
    546             data[values]._values, index=multiindex, columns=values
    547         )
    548     else:
--> 549         indexed = data._constructor_sliced(data[values]._values, index=multiindex)
    550 # error: Argument 1 to "unstack" of "DataFrame" has incompatible type "Union
    551 # [List[Any], ExtensionArray, ndarray[Any, Any], Index, Series]"; expected
    552 # "Hashable"
    553 result = indexed.unstack(columns_listlike)  # type: ignore[arg-type]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:3893, in DataFrame.__getitem__(self, key)
   3891 if self.columns.nlevels > 1:
   3892     return self._getitem_multilevel(key)
-> 3893 indexer = self.columns.get_loc(key)
   3894 if is_integer(indexer):
   3895     indexer = [indexer]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
   3792     if isinstance(casted_key, slice) or (
   3793         isinstance(casted_key, abc.Iterable)
   3794         and any(isinstance(x, slice) for x in casted_key)
   3795     ):
   3796         raise InvalidIndexError(key)
-> 3797     raise KeyError(key) from err
   3798 except TypeError:
   3799     # If we have a listlike key, _check_indexing_error will raise
   3800     #  InvalidIndexError. Otherwise we fall through and re-raise
   3801     #  the TypeError.
   3802     self._check_indexing_error(key)

KeyError: 'AutoARIMA'

Versions / Dependencies

v.0.3.0 v0.4.0 v0.4.1

Reproduction script

# !pip install -U numba statsforecast datasetsforecast
import numpy as np
import pandas as pd

#obtain hierarchical dataset
from datasetsforecast.hierarchical import HierarchicalData

# compute base forecast no coherent
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.evaluation import HierarchicalEvaluation
from hierarchicalforecast.methods import BottomUp, TopDown, MiddleOut

# Load TourismSmall dataset
Y_df, S, tags = HierarchicalData.load('./data', 'TourismSmall')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

#split train/test sets
Y_test_df  = Y_df.groupby('unique_id').tail(4)
Y_train_df = Y_df.drop(Y_test_df.index)

# Compute base auto-ARIMA predictions
fcst = StatsForecast(df=Y_train_df,
                     models=[AutoARIMA(season_length=4), Naive()],
                     freq='Q', n_jobs=-1)
Y_hat_df = fcst.forecast(h=4)

# Reconcile the base predictions
reconcilers = [
    BottomUp(),
    TopDown(method='average_proportions'),
    MiddleOut(middle_level='Country/Purpose/State',
              top_down_method='forecast_proportions')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
                          S=S, tags=tags)

Issue Severity

None

mjsandoval04 commented 12 months ago

I have the same issue but in my case, it returns NaN for both TopDown_method-average_proportions and TopDown_method-proportion_averages. it returns values for bottom-up and MinTrace_method. I have tried with previous release versions starting from 0.2.0 and got the same results for my data. please advise!

jmoralez commented 12 months ago

Hey. The TopDown method requires the in-sample predictions of the models to be provided in Y_df, so if you add the following to your example it should work:

Y_hat_df = fcst.forecast(h=4, fitted=True)  # added fitted=True here
insample_df = fcst.forecast_fitted_values()  # get in-sample predictions
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S, tags=tags)  # provide insample_df through Y_df

jmberutich commented 12 months ago

Thanks for the quick response. I followed your suggestion and adding the model insample predictions worked.

I have some questions regarding the TopDown (average_proportions and proportion_averages) methods are how the are calculated.

Why are they estimated from the fitted values when we can do it with lower error from the actual data?
If the insample predictions are not available for a model we want to reconcile, what would duplicating the "y" column as the insample predictions cause? (Assume the model has 0 error on the training data).

For forecast_proportions would it not make sense to use the out of sample predictions?

mjsandoval04 commented 12 months ago

@jmoralez thanks for the response, in my case, my code was correct I provided the in-sample predictions in Y_df but still the TopDown results are "NaN". it works for the BottomUp and other methods that I tested like "OptimalCombination" and "MinTrace". So it is strange that is returning NaN for the TopDown method.

any recommendation, please ? below is a snippet of my code:

AzulGarza commented 12 months ago

hey @jmberutich, regarding your questions on the methods:

The average_proportions and proportion_averages approaches use the actual data (the target values used for training). The insample_df is required because it contains the historical target values (just adding an insample_df with columns unique_id, ds, and y should work for TopDown).
following this, you don't need to have the in-sample prediction to use the methods, you only need to pass insample_df with the historical target variable.
The forecast_proportions methods use the out-of-sample predictions.

nelsoncardenas commented 11 months ago

So, the solution would be

Y_hat_df = fcst.forecast(h=group.horizon, fitted=True)
insample_df = fcst.forecast_fitted_values()
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)

option B

insample_df = Y_train_df.copy()
insample_df["AutoARIMA"] = insample_df["y"]
insample_df["Naive"] = insample_df["y"]
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)

Nixtla / hierarchicalforecast