Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.98k stars 342 forks source link

Rolling window in the predict() method [NeuralForecast] #1054

Closed evandrocardozo closed 2 months ago

evandrocardozo commented 3 months ago

What happened + What you expected to happen

Is it possible in Nixtla to use the predict method to extend the forecasting over the whole test set length? e.g. train a NeuralForecast model to predict, let`s say, h=7 days ahead, then predict on a test set using futr predictors on a total horizon of 30 days (Kinda like a rolling window method).

What I usually see in the examples from documentation or people doing, is to split the data so that the test set matches exactly the forecasting horizon. The data consists of a training set with 64320 rows (columns: id, ds, multiple future predictors, y) and a testing set with 5360 rows storing only the future predictors. So basically I want to be able to roll window through these 5360 rows, considering a shorter horizon from the model fitted.

ps: in my experiments, the approach for doing one shot prediction over the entire horizon for the test set was impractical (that is, setting the model`s h to the entire testing set length), since I ran out o RAM!

Versions / Dependencies

Python version = 3.10

Reproduction script

train df

train.rename(columns = {'id':'unique_id', 'date':'ds', 'Temperature':'y'}, inplace=True) train['ds'] = pd.to_datetime(train['ds']) train['unique_id'] = '1'

test df

test.rename(columns = {'id':'unique_id', 'date':'ds'}, inplace=True) test['ds'] = pd.to_datetime(test['ds']) test['unique_id'] = '1'

levels = [10,20,30,40,50,60,70,80,90,95]

model = NHITS(h=96, input_size=672, loss=HuberMQLoss(level=levels), futr_exog_list = ['feature_AA','feature_AB','feature_BA','feature_BB','feature_CA','feature_CB'], max_steps = 10, batch_size = 64 )

probabilistic_nhits = NeuralForecast(models=[model], freq='15min') probabilistic_nhits.fit(df=train) forecastings = probabilistic_nhits.predict(futr_df=test)

Issue Severity

Low: It annoys or frustrates me.

jmoralez commented 3 months ago

You can use a recursive approach where your predictions are fed to the model as the targets for the next step with something like the following:

full_horizon = 200
n_predicts = math.ceil(full_horizon / model.h)
combined_train = train
forecasts = []
for _ in range(n_predicts):
    step_forecast = probabilistic_nhits.predict(df=combined_train, futr_df=test)
    forecasts.append(step_forecast)
    step_forecast =  step_forecast.rename(columns={your_model_name: 'y'})
    combined_train = pd.concat([combined_train, step_forecast])
pd.concat(forecasts)
samuel-gerstein commented 2 months ago

You can use a recursive approach where your predictions are fed to the model as the targets for the next step with something like the following:


full_horizon = 200

n_predicts = math.ceil(full_horizon / model.h)

combined_train = train

forecasts = []

for _ in range(n_predicts):

    step_forecast = probabilistic_nhits.predict(df=combined_train, futr_df=test)

    forecasts.append(step_forecast)

    combined_train = pd.concat([combined_train, step_forecast])

pd.concat(forecasts)

How can this code be extended for a NeuralForecast object containing multiple models? There doesn't seem to be an intuitive way of doing it for that case without having to instantiate each model separately.

jmoralez commented 2 months ago

The model's predictions are used as the future target, so this has to be done per model. If you don't want to create separate instances you can iterate over the models attribute and assign one at a time in a loop.

evandrocardozo commented 2 months ago

Hey guys, thanks for replying back. Just to give you some context. I was trying to apply some probabilistic models for time series forecasting. The training data consists of 64,320 rows, spanning from 2016-07-01 to 2018-05-01. The test data comprises the exogenous predictors over a horizon of 5,360 data points, spanning from 2018-05-02 to 2018-06-26. The data frequency is minutes.

jmoralez commented 2 months ago

@evandrocardozo do you need further help? I believe the example in https://github.com/Nixtla/neuralforecast/issues/1054#issuecomment-2203866737 achieves what you want.

evandrocardozo commented 2 months ago

I believe that solves the problem for now. Thanks for the assistance!