Train on custom dataset

Hi @tomyjara!

You can use something like this to build a custom dataset.

Create a JSON lines file with your time series data. Basically every line has one time series in JSON format with two keys, start (the start time stamp) and target (the actual time series). I have attached an example file. Note that the time series are not required to have the same start or length.
Use this function to load the file as a GluonTS dataset.

from pathlib import Path

from gluonts.dataset.split import split
from gluonts.dataset.common import (
    MetaData,
    TrainDatasets,
    FileDataset,
)

def get_custom_dataset(
    jsonl_path: Path,
    freq: str,
    prediction_length: int,
    split_offset: int = None,
):
    """Creates a custom GluonTS dataset from a JSONLines file and
    give parameters.

    Parameters
    ----------
    jsonl_path
        Path to a JSONLines file with time series
    freq
        Frequency in pandas format
        (e.g., `H` for hourly, `D` for daily)
    prediction_length
        Prediction length
    split_offset, optional
        Offset to split data into train and test sets, by default None

    Returns
    -------
        A gluonts dataset
    """
    if split_offset is None:
        split_offset = -prediction_length

    metadata = MetaData(freq=freq, prediction_length=prediction_length)
    test_ts = FileDataset(jsonl_path, freq)
    train_ts, _ = split(test_ts, offset=split_offset)
    dataset = TrainDatasets(metadata=metadata, train=train_ts, test=test_ts)
    return dataset

This get_custom_dataset can be used as a replacement for https://github.com/amazon-science/unconditional-time-series-diffusion/blob/50f52da1c583d2eece4da8e933f34b73dc249a75/bin/train_model.py#L135
Modify the default config appropriately, especially the context length, lags, etc.

Thanks @marcelkollovieh for helping with the response!

amazon-science / unconditional-time-series-diffusion

Train on custom dataset #7