Closed denwolff closed 4 months ago
Hey @denwolff, thanks for using mlforecast. The new_df
is like the new "training set", it will be used to extract the lags, times, etc. If you have exogenous features you also have to provide X_df
with the future values of the exogenous features (for the times after new_df
).
Hi, thank you very much for your response.
Sorry for me there are several points of confusion. First of all, when I try with the M3 or M5 dataset, calling '.predict'
with the new_df
parameter works without providing an argument for X_df:
I don't understand why it is necessary for my data then to provide X_df.
Second, concerning the phrase "If you have exogenous features you also have to provide X_df with the future values of the exogenous features (for the times after new_df
)":
So, what I would like to do is, use the model that has already been trained, to predict an entirely new time series new_df
that the model has not seen before. So no further training with any new dataset should be necessary? Why do I need to train the model again? Or is it that it's not possible to use the pretrained model on an entirely new time series without having trained it on its first couple of samples? (But then wouldn't understand why it works for the M3/M5 data)
Then, the future values of the exogenous features after new_df
would mean the future values of the future values that I want to predict? I must be misunderstanding something.
(I was expecting that the issue is somehow related to the way my dataframes are structured, though I could find no difference to the M3/M5 data for which it worked)
static_features
in those cases? If you're not setting anything your features are being interpreted as static and thus you don't need to provide the future values.Thank you, I realized that in the M3/M5 dataset examples I had not set static_features=[]
and therefore the exogenous features had actually not been used.
I understood now - in case I have exogenous features, for df_new
I need to give some samples of the beginning (target + exogenous features) and for X_df
the exogenous features of the features samples I want to predict.
Thank you very much for your help!
What happened + What you expected to happen
I'm trying to make new predictions for data the model hasn't seen yet, using 'fcst.predict' with the 'new_df' parameter. This works for me with the M5 example dataset, but not with my own data, here I get the error: KeyError: "['Ain', 'Bin', 'Cin'] not in index" (Ain, Bin and Cin are my three features). The data I trained the model with and my new data have exactly the same structure and dtypes: The data don't have any missing values.
Yet, somehow 'predict' complains that there are some features missing (unclear for me whether in the train set or the new data):
predictions_new_data = fcst.predict(h=FORECAST_HORIZON_TEST, new_df=X_NEW_TIMESERIES)
Versions / Dependencies
Python 3.10.11 mlforecast 0.13.0
Reproduction script
lags=[1, 2, 3, 4, 5, 10, 50, 100, 129]
fcst = MLForecast( models=lgb.LGBMRegressor(random_state=0, verbosity=-1), freq=1, lags=lags, lag_transforms={ 1: [expanding_mean], 100: [RollingMean(window_size=100)], }, target_transforms=[Differences([24])] ) fcst.fit(X_TRAIN, static_features=[])
predictions_new_data = fcst.predict(h=FORECAST_HORIZON_TEST, new_df=X_NEW_TIMESERIES)
Issue Severity
High: It blocks me from completing my task.