Nixtla / hierarchicalforecast

Probabilistic Hierarchical forecasting šŸ‘‘ with statistical and econometric methods.
https://nixtlaverse.nixtla.io/hierarchicalforecast
Apache License 2.0
588 stars 76 forks source link

TopDown Reconciliation Error Without All Forecasts #290

Open breadwall opened 1 month ago

breadwall commented 1 month ago

What happened + What you expected to happen

Reconcile method TopDown with average_proportions appears to require forecasts for all hierarchy levels even though in TopDown you should just need the forecasts at the top and the historical values for all combinations. I tried filling in all missing hierarchies in the Y_hat_df with dummy values like 1, but the top-down forecasts are impacted.

Am I missing something?

Versions / Dependencies

hierarchical_forecast ~ 0.4.2

Reproduction script

import numpy as np
import pandas as pd

from statsforecast.core import StatsForecast
from statsforecast.models import AutoETS

from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.evaluation import HierarchicalEvaluation
from hierarchicalforecast.methods import TopDown

from hierarchicalforecast import utils

# Parameters for dataset creation
n_rows = 1000
date_range = pd.date_range(start='2020-01-01', periods=n_rows, freq='MS')
group_col_one_values = ['A', 'B', 'C']
group_col_two_values = ['X', 'Y', 'Z']

# Create the dataset
data = pd.DataFrame({
    'group_col_one': np.random.choice(group_col_one_values, size=n_rows),
    'group_col_two': np.random.choice(group_col_two_values, size=n_rows),
    'ds': date_range,
    'y': np.random.randint(1, 100, size=n_rows)
})

# Create Top Level to Generate Forecasts
top_data = data.groupby(by=['group_col_one', 'ds'])['y'].sum().reset_index()
Y_top_df, S_top_df, tags_top = utils.aggregate(top_data, [['group_col_one']])

# Create Historical Values of Both 'Top' & 'Bottom'
Y_hist_df, S_hist_df, tags_hist = utils.aggregate(data, [['group_col_one'], ['group_col_one', 'group_col_two']])

# Produce Top Level Forecasts to use for Disaggregation
fcst = StatsForecast(models=[AutoETS(season_length=12)],
                     freq='MS')
Y_hat_df = fcst.forecast(h=12, df=Y_top_df)
reconcilers = [
            TopDown(method='proportion_averages'),
        ]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df, S_hist_df, tags_hist, Y_hist_df)

Issue Severity

None

elephaint commented 1 month ago

THanks for raising the issue; I can reproduce.

I'll have to think about this a bit; I agree with your point that forecasts should only be required for the Top, but the implemented checks prevent that. Before simply bypassing these checks in this case I need to test a bit further if nothing else breaks.

christophertitchen commented 1 month ago

It is an interesting quirk of the design. I noticed it too but did not really think too much into it because in my current use cases, which do not use this library yet, I generate forecasts for all levels. Of course, I can appreciate that in production, if you and your practitioner(s) decide on a particular single-level approach, there is no need to exhaustively forecast at every level or mess about with reshaping Y_hat_df yourself, so thanks Olivier for looking into it! šŸ‘

The forecasts for average_proportions should be the same regardless of the forecasts of the lower levels though, so that is a worry if you fill them with $1$ and get strange results? Actually, come to think of it, we do not even need the in-sample values of the "middle" levels for this scenario, just the top and bottom levels.