awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.62k stars 753 forks source link

Incorrect output of prediction start date #751

Closed sayonsom closed 4 years ago

sayonsom commented 4 years ago

Description

I am not getting the anticipated prediction start date. My dataset ends at 07-10-2019, and I am trying to predict for the next couple of days. However, my prediction starts at a much later date, specifically: 10-26-2021

To Reproduce

Please try executing the following stand-alone code as a minimal working example of my problem:

import pandas as pd
from gluonts.dataset.common import ListDataset
from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.trainer import Trainer
from pathlib import Path

# CSV File is available here: "https://gist.github.com/sayonsom/af6604bb38d50e366e53d1ccc994b269"
csv_url = 'Text_example.csv'
df = pd.read_csv(csv_url, index_col=None, names=['Timestamp', 'measured_value'], parse_dates=False)

prediction_length = 96
freq = "15min"
start = pd.Timestamp(df['Timestamp'][0], freq=freq)
target = df['measured_value'].to_numpy()

# train dataset: cut the last window of length "prediction_length", add "target" and "start" fields
train_ds = ListDataset([{'target': target[:-prediction_length], 'start':start}], freq=freq)
# test dataset: use the whole dataset, add "target" and "start" fields
test_ds = ListDataset([{'target': target[-prediction_length:], 'start': start}], freq=freq)

estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    context_length=96*3, # input window length, 3 days data
    prediction_length=prediction_length,
    freq=freq,
    trainer=Trainer(ctx="cpu", epochs=4, learning_rate=1E-3, hybridize=True, num_batches_per_epoch=200),
)

predictor = estimator.train(train_ds)

target = df['measured_value'].to_numpy()
test_ds = ListDataset([{'target': target, 'start': '2019-04-11 00:00:00'}], freq=freq)

predictions = list(predictor.predict(test_ds))

known_data = ListDataset([{'target': target, 'start': '2017-04-27 00:00:00'}], freq=freq)
print(f"Dimension of samples: {predictions[0].samples.shape}")
print(f"Start date of the forecast window: {predictions[0].start_date}")

Error message or code output

... 
INFO:root:Using CPU
INFO:root:Using CPU
Dimension of samples: (100, 96)
Start date of the forecast window: 2021-10-26 15:00:00 

(Expecting 07-11-2019, instead)

Environment

lostella commented 4 years ago

(Expecting 07-11-2019, instead)

@sayonsom why are you expecting that as forecast start date? Running the following

import pandas as pd
csv_url = "https://gist.githubusercontent.com/sayonsom/af6604bb38d50e366e53d1ccc994b269/raw/7fa603a4618286aa4376556043627e9383d27117/Test_example.csv"
df = pd.read_csv(csv_url, index_col=None, names=['Timestamp', 'measured_value'], parse_dates=False)
freq = "15min"
# This is the data you fed into the predictor
target = df['measured_value'].to_numpy()
# This is the start date of the data you fed into the predictor
start = pd.Timestamp('2019-04-11 00:00:00', freq=freq) 

print(start + len(target) * start.freq)

yields 2020-03-31 07:00:00

sayonsom commented 4 years ago

@lostella Thanks for clarifying. I was not able to comprehend how to define a "begin forecast from this date onwards" from the documentation. I will close this issue.

lostella commented 4 years ago

I see, happy if that helps. Just to clarify: is that the forecast start date that you get? If not, there might still be an issue... I couldn’t yet run your full example, will do that to double check though 👍🏻