Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.69k stars 312 forks source link

NeuralForecast.predict ignores the requested ds in futr_df #1003

Closed fsaad closed 1 month ago

fsaad commented 1 month ago

What happened + What you expected to happen

  1. The Bug. I have a training data frame of the following form
    unique_id  ds  y
    0         H1   1  1
    1         H1   2  2
    2         H1   3  3
    3         H1   8  4
    4         H1   9  5
    5         H1  10  6
    6         H2   5  1
    7         H2   6  2
    8         H2   7  3
    9         H2   8  4
    10        H2   9  5
    11        H2  10  6

The goal is to generate the following 4-step predictions:

Toward this end, I create futr_df which contains ds with all the requested time points.

However, the returned data frame from NeuralForecast.predict(futr_df=futr_df) does not contain any predictions for time points 4, 5, 6, 7.

  1. Expected Behavior. The returned data frame should contain predictions for all the requested ds.

  2. Useful Information. Please see the minimal reproduction script.

Versions / Dependencies

Reproduction script

from neuralforecast import NeuralForecast
from neuralforecast.models import LSTM
import pandas as pd

# Dummy training data.
Y_train = pd.DataFrame({
    'unique_id': ['H1'] * 6          + ['H2'] * 6,
    'ds':        [1, 2, 3, 8, 9, 10] + [5, 6, 7, 8, 9, 10],
    'y':         [1, 2, 3, 4, 5, 6]  + [1, 2, 3, 4, 5, 6]
})

# Fit LSTM.
horizon = 4
models = [LSTM(input_size=horizon, h=horizon,  max_steps=1)]
nf = NeuralForecast(models=models, freq=1)
nf.fit(Y_train)

# Generate predictions.
futr_df = pd.DataFrame({
    'unique_id': ['H1'] * 4   + ['H2'] * 4,
    'ds':        [4, 5, 6, 7] + [11, 12, 13, 14]
})
futr_df = pd.concat([futr_df, nf.get_missing_future(futr_df)])
nf.predict(futr_df=futr_df).ds

Issue Severity

High: It blocks me from completing my task.

elephaint commented 1 month ago

Hi - thanks for using neuralforecast.

To solve your problem, you could remove the data from timestamps 8, 9 and 10 from H1.

Code:

from neuralforecast import NeuralForecast
from neuralforecast.models import LSTM
import pandas as pd

# Dummy training data.
Y_train = pd.DataFrame({
    'unique_id': ['H1'] * 3          + ['H2'] * 6,
    'ds':        [1, 2, 3] + [5, 6, 7, 8, 9, 10],
    'y':         [1, 2, 3]  + [1, 2, 3, 4, 5, 6]
})

# Fit LSTM.
horizon = 4
models = [LSTM(input_size=horizon, h=horizon,  max_steps=1)]
nf = NeuralForecast(models=models, freq=1)
nf.fit(Y_train)

# Generate predictions.
futr_df = pd.DataFrame({
    'unique_id': ['H1'] * 4   + ['H2'] * 4,
    'ds':        [4, 5, 6, 7] + [11, 12, 13, 14]
})
nf.predict(futr_df=futr_df).ds

If you want to explicitly include information from the future to predict the past - note that it now becomes an interpolating exercise, not a forecasting task! - because you have available ground truths for timestamps 8, 9, 10 you could add future available ground truths as a separate exogenous variable.

Hope this helps, let me know.

fsaad commented 1 month ago

Thank you for this workaround, @elephaint.