model.predict encounters an error

kkckk1110 commented 8 months ago

What happened + What you expected to happen

I am using model.predict to get forecasts. However, the following error came up:

before_predict_callback, after_predict_callback, X_df, ids) 718 if X_df.shape[0] != len(self._uids) * horizon: 719 msg = ( 720 "Found missing inputs in X_df. " 721 "It should have one row per id and time for the complete forecasting horizon.\n" 722 "You can get the expected structure by running MLForecast.make_future_dataframe(h) " 723 "or get the missing combinatins in your current X_df by running MLForecast.get_missing_future(h, X_df)." 724 ) --> 725 raise ValueError(msg) 726 drop_cols = [self.id_col, self.time_col, "_start", "_end"] 727 X_df = ufp.sort(X_df, [self.id_col, self.time_col]).drop(columns=drop_cols)

ValueError: Found missing inputs in X_df. It should have one row per id and time for the complete forecasting horizon. You can get the expected structure by running MLForecast.make_future_dataframe(h) or get the missing combinatins in your current X_df by running MLForecast.get_missing_future(h, X_df).

However, I think that my data format is correct. In the valid dataset, I have 12(horizon) * 4 (number of time series) = 48 records, I wonder why that error occurs.

Also, when I predict, I input the unique_id, ds, and other exogenous features without the target variable. Do I operate correctly?

Versions / Dependencies

MacOs, python == 3.9.12, mlforecast == 0.11.5

Reproduction script

import lightgbm as lgb

models=lgb.LGBMRegressor(n_jobs=1, random_state=0, verbosity=-1)

model = MLForecast(models=models, freq='M')

model.fit(train, id_col='unique_id', time_col='ds', target_col='sales', static_features=[]) p = model.predict(h=12, X_df = valid.iloc[:,:-1]) #I use :-1 to exclude the target variable

Issue Severity

High: It blocks me from completing my task.

jmoralez commented 8 months ago

Hey @kkckk1110, thanks for using mlforecast. You can run any of the suggested methods in the error to figure out what's wrong, it could be a wrong frequency (e.g. 'MS' instead of 'M'), missing ids, etc.

kkckk1110 commented 8 months ago

Thank you! I checked the frequency list and realized that MS was suitable.

melissafeeney commented 8 months ago

I am also having this issue, my X_df contains the unique_id, ds, and my 1 exogenous var of my test set. I have one dynamic exogenous variable, and no static exogenous variables. I am doing this analysis on a weekly basis, (with the start of each week being Monday) instead of monthly like OP. Here is my example:

ValueError Traceback (most recent call last) in <cell line: 10>() 8 9 model.fit(train, id_col='unique_id', time_col='ds', target_col='y', static_features=[]) ---> 10 p = model.predict(h=12, X_df = valid.iloc[:,:-1]) #I use :-1 to exclude the target variable

1 frames /usr/local/lib/python3.10/dist-packages/mlforecast/core.py in predict(self, models, horizon, before_predict_callback, after_predict_callback, X_df, ids) 727 "or get the missing combinatins in your current X_df by running MLForecast.get_missing_future(h, X_df)." 728 ) --> 729 raise ValueError(msg) 730 drop_cols = [self.id_col, self.time_col, "_start", "_end"] 731 X_df = ufp.sort(X_df, [self.id_col, self.time_col]).drop(columns=drop_cols)

ValueError: Found missing inputs in X_df. It should have one row per id and time for the complete forecasting horizon. You can get the expected structure by running MLForecast.make_future_dataframe(h) or get the missing combinatins in your current X_df by running MLForecast.get_missing_future(h, X_df).

Nixtla / mlforecast