awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.54k stars 747 forks source link

How to save/load a model DeepAR and get predictions ? #1034

Closed Harry-N closed 3 years ago

Harry-N commented 3 years ago

Hello everyone, I am new to GluonTS.

I use deepAR and I followed the tutorial to have a good predictor based on the results with the "make_evaluation_predictions" function. But now I would like to save and load models that have been trained. I saw that there was the "serialize" function but I do not understand how to save several models in the same folder to use them later.

Also, I really need to be able to run predictions from the saved models but I don't really know how either ...

If anyone can help me, I will be very grateful!

In advance, thank you very much.

jaheba commented 3 years ago

Hello @Harry-N,

you can have a look at the shell-module. It essentially implements a SageMaker compatible container and should be a good starting point for understanding how to train and use a model.

Here is where we serialise the model: https://github.com/awslabs/gluon-ts/blob/acfd7e14c4ef6eaa62fea6d6233a9e336f6366e4/src/gluonts/shell/train.py#L88

In serve.py, we load a model and then use it in each request: https://github.com/awslabs/gluon-ts/blob/acfd7e14c4ef6eaa62fea6d6233a9e336f6366e4/src/gluonts/shell/serve/app.py#L97-L111

pratikgehlott commented 2 years ago

hello @jaheba,

I have my code as below,

%matplotlib inline
import mxnet as mx
from mxnet import gluon
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json
import os

from gluonts.dataset.repository.datasets import get_dataset, dataset_recipes
from gluonts.dataset.util import to_pandas

print(f"Available datasets: {list(dataset_recipes.keys())}")

dataset = get_dataset("m4_hourly", regenerate=True)

entry = next(iter(dataset.train))
train_series = to_pandas(entry)
train_series.plot()
plt.grid(which="both")
plt.legend(["train series"], loc="upper left")
plt.show()

entry = next(iter(dataset.test))
test_series = to_pandas(entry)
test_series.plot()
plt.axvline(train_series.index[-1], color='r') # end of train dataset
plt.grid(which="both")
plt.legend(["test series", "end of train series"], loc="upper left")
plt.show()

print(f"Length of forecasting window in test dataset: {len(test_series) - len(train_series)}")
print(f"Recommended prediction horizon: {dataset.metadata.prediction_length}")
print(f"Frequency of the time series: {dataset.metadata.freq}")

N = 10  # number of time series
T = 100  # number of timesteps
prediction_length = 24
freq = "1H"
custom_dataset = np.random.normal(size=(N, T))
start = pd.Timestamp("01-01-2019", freq=freq)  # can be different for each time series

from gluonts.dataset.common import ListDataset

# train dataset: cut the last window of length "prediction_length", add "target" and "start" fields
train_ds = ListDataset(
    [{'target': x, 'start': start} for x in custom_dataset[:, :-prediction_length]],
    freq=freq
)
# test dataset: use the whole dataset, add "target" and "start" fields
test_ds = ListDataset(
    [{'target': x, 'start': start} for x in custom_dataset],
    freq=freq
)

from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx.trainer import Trainer

estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    prediction_length=dataset.metadata.prediction_length,
    context_length=100,
    freq=dataset.metadata.freq,
    trainer=Trainer(
        ctx="cpu",
        epochs=5,
        learning_rate=1e-3,
        num_batches_per_epoch=100
    )
)

predictor = estimator.train(dataset.train)

from gluonts.evaluation import make_evaluation_predictions

forecast_it, ts_it = make_evaluation_predictions(
    dataset=dataset.test,  # test dataset
    predictor=predictor,  # predictor
    num_samples=100,  # number of sample paths we want for evaluation
)

forecasts = list(forecast_it)
tss = list(ts_it)

# first entry of the time series list
ts_entry = tss[0]

# first 5 values of the time series (convert from pandas to numpy)
np.array(ts_entry[:5]).reshape(-1,)

# first entry of dataset.test
dataset_test_entry = next(iter(dataset.test))

dataset_test_entry['target'][:5]

# first entry of the forecast list
forecast_entry = forecasts[0]

print(f"Number of sample paths: {forecast_entry.num_samples}")
print(f"Dimension of samples: {forecast_entry.samples.shape}")
print(f"Start date of the forecast window: {forecast_entry.start_date}")
print(f"Frequency of the time series: {forecast_entry.freq}")

print(f"Mean of the future window:\n {forecast_entry.mean}")
print(f"0.5-quantile (median) of the future window:\n {forecast_entry.quantile(0.5)}")

def plot_prob_forecasts(ts_entry, forecast_entry):
    plot_length = 150
    prediction_intervals = (50.0, 90.0)
    legend = ["observations", "median prediction"] + [f"{k}% prediction interval" for k in prediction_intervals][::-1]

    fig, ax = plt.subplots(1, 1, figsize=(10, 7))
    ts_entry[-plot_length:].plot(ax=ax)  # plot the time series
    forecast_entry.plot(prediction_intervals=prediction_intervals, color='g')
    plt.grid(which="both")
    plt.legend(legend, loc="upper left")
    plt.show()

plot_prob_forecasts(ts_entry, forecast_entry)

from gluonts.evaluation import Evaluator

evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(dataset.test))

print(json.dumps(agg_metrics, indent=4))

item_metrics.plot(x='MSIS', y='MASE', kind='scatter')
plt.grid(which="both")
plt.show()

from pathlib import Path
predictor.serialize(Path(os.getcwd()))

from gluonts.model.predictor import Predictor
predictor_deserialized = Predictor.deserialize(Path(os.getcwd()))

def handle_predictions(predictor, instances, configuration):
    # create the forecasts
    forecasts = ThrougputIter(
        predictor.predict(
            ListDataset(instances, predictor.freq),
            num_samples=configuration.num_samples,
        )
    )
    predictions = [
        forecast.as_json_dict(configuration) for forecast in forecasts
    ]
    log_throughput(instances, forecasts.timings)
    return predictions

Now I am unable to figure out how should I call my predict function to make prediction, any idea, what parameter should I send, none of stuff is present in the document that's the reason I am facing an issue.

Thank you.

@lostella maybe you can also help.