Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
789 stars 74 forks source link

Fcst.predict does not accept X_df with dynamic exogenous variables #307

Closed melissafeeney closed 5 months ago

melissafeeney commented 5 months ago

I am training a simple time series model, and my dataset includes 1 dynamic exogenous variable (holiday) and no static exogenous variables. The training set contains 131 weeks of data, and the test set contains 24 weeks of data. Each week starts on a Monday.

The training dataset looks like this:

Screen Shot 2024-01-26 at 3 18 32 PM

The test dataset looks like this:

Screen Shot 2024-01-26 at 3 20 16 PM

I am running this first train a forecasting model on my training data, after which I want to test it on my test data. Even though my test data contains unique_id, ds, and holiday (the dynamic exogenous variable), I get an error:

Screen Shot 2024-01-26 at 3 23 05 PM

If I try to use the suggestions in the error to create X_df using fcst.make_future_dataframe(h) but when I do that, the ds dates transform into Sundays (instead of Mondays as they are in my train and test datasets).

Screen Shot 2024-01-26 at 3 24 03 PM

If I try to use the suggestions in the error to use fcst.get_missing_future(h, X_df), I get this- and also notice how the dates changed from Monday into Sundays...:

Screen Shot 2024-01-26 at 3 26 01 PM

Versions / Dependencies

I am using Google colab and installed MLforecast from pip as well as from source as suggested from another issue @298: ! pip install git+https://github.com/Nixtla/mlforecast

version: mlforecast-0.11.6

Reproduction script

from sklearn.ensemble import RandomForestRegressor

train = data.loc[0:130]
test = data.loc[131:]

models = RandomForestRegressor(random_state=0, n_estimators = 100)

fcst = MLForecast(models=models, freq = 'W')

fcst.fit(train, fitted = True, static_features=[])
preds = fcst.predict(h = 24, X_df = test[['unique_id', 'ds', 'holiday']]) 

Issue Severity

High: It blocks me from completing my task.

melissafeeney commented 5 months ago

Interestingly, changing the freq parameter when fitting the fcst object to 'W-MON' may have solved the issue! This runs without issue:

from sklearn.ensemble import RandomForestRegressor

train = data.loc[0:130]
test = data.loc[131:]

models = RandomForestRegressor(random_state=0, n_estimators = 100)

fcst = MLForecast(models=models, freq = 'W-MON')

fcst.fit(train, fitted = True, static_features=[])
preds = fcst.predict(h = 24, X_df = test[['unique_id', 'ds', 'holiday']]) 
jmoralez commented 5 months ago

Hey @melissafeeney, thanks for using mlforecast. We use the freq argument to build the future dates, so it must match the frequency of your series ('W-MON' in this case is the correct one). We're glad the debugging methods helped you figure out the problem.