amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
https://arxiv.org/abs/2403.07815
Apache License 2.0
2.38k stars 273 forks source link

How to implement KernelSynth #62

Closed ForestsKing closed 4 months ago

ForestsKing commented 4 months ago

I am interested in the implementation of KernelSynth, I wonder if you will provide the code, thanks!

abdulfatir commented 4 months ago

Yes, we're planning to release the training code (including KernelSynth) as soon as we can. The current plan is to release it within the next two weeks. Stay tuned!

cc @lostella

ForestsKing commented 4 months ago

I am excited to hear about this open source plan. However, two weeks is still a bit too long for me to wait. I wonder if it would be convenient for you to provide the implementation of KernelSynth first?

abdulfatir commented 4 months ago

Sure, we used this script to generate synthetic data (KernelSynth) for the paper:

import argparse
import functools
from pathlib import Path

import numpy as np
from joblib import delayed, Parallel
from tqdm.auto import tqdm
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import (
    DotProduct,
    ExpSineSquared,
    RBF,
    RationalQuadratic,
    WhiteKernel,
    ConstantKernel,
)
from gluonts.dataset.arrow import ArrowWriter

LENGTH = 1024
KERNEL_BANK = [
    ExpSineSquared(periodicity=24 / LENGTH),  # H
    ExpSineSquared(periodicity=48 / LENGTH),  # 0.5H
    ExpSineSquared(periodicity=96 / LENGTH),  # 0.25H
    ExpSineSquared(periodicity=24 * 7 / LENGTH),  # H
    ExpSineSquared(periodicity=48 * 7 / LENGTH),  # 0.5H
    ExpSineSquared(periodicity=96 * 7 / LENGTH),  # 0.25H
    ExpSineSquared(periodicity=7 / LENGTH),  # D
    ExpSineSquared(periodicity=14 / LENGTH),  # 0.5D
    ExpSineSquared(periodicity=30 / LENGTH),  # D
    ExpSineSquared(periodicity=60 / LENGTH),  # 0.5D
    ExpSineSquared(periodicity=365 / LENGTH),  # D
    ExpSineSquared(periodicity=365 * 2 / LENGTH),  # 0.5D
    ExpSineSquared(periodicity=4 / LENGTH),  # W
    ExpSineSquared(periodicity=26 / LENGTH),  # W
    ExpSineSquared(periodicity=52 / LENGTH),  # W
    ExpSineSquared(periodicity=4 / LENGTH),  # M
    ExpSineSquared(periodicity=6 / LENGTH),  # M
    ExpSineSquared(periodicity=12 / LENGTH),  # M
    ExpSineSquared(periodicity=4 / LENGTH),  # Q
    ExpSineSquared(periodicity=4 * 10 / LENGTH),  # Q
    ExpSineSquared(periodicity=10 / LENGTH),  # Q
    DotProduct(sigma_0=0.0),
    DotProduct(sigma_0=1.0),
    DotProduct(sigma_0=10.0),
    RBF(length_scale=0.1),
    RBF(length_scale=1.0),
    RBF(length_scale=10.0),
    RationalQuadratic(alpha=0.1),
    RationalQuadratic(alpha=1.0),
    RationalQuadratic(alpha=10.0),
    WhiteKernel(noise_level=0.1),
    WhiteKernel(noise_level=1.0),
    ConstantKernel(),
]

def random_binary_map(a, b):
    binary_maps = [lambda x, y: x + y, lambda x, y: x * y]
    return np.random.choice(binary_maps)(a, b)

def generate_time_series(max_kernels=5):
    while True:
        X = np.linspace(0, 1, LENGTH)
        selected_kernels = np.random.choice(
            KERNEL_BANK, np.random.randint(1, max_kernels + 1), replace=True
        )
        kernel = functools.reduce(random_binary_map, selected_kernels)
        gpr = GaussianProcessRegressor(kernel=kernel)
        try:
            ts = gpr.sample_y(X[:, None], n_samples=1, random_state=None)
        except np.linalg.LinAlgError:
            continue
        return {"start": np.datetime64("2000-01-01 00:00", "s"), "target": ts.squeeze()}

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-N", "--num_series", type=int, default=1000_000)
    args = parser.parse_args()
    path = Path(__file__).parent / "kernel-synth.arrow"

    generated_dataset = Parallel(n_jobs=-1)(
        delayed(generate_time_series)() for _ in tqdm(range(args.num_series))
    )

    ArrowWriter(compression="lz4").write_to_file(
        generated_dataset,
        path=path,
    )

You'll need to install: pip install "gluonts[pro]" joblib scikit-learn.

Do note that this version may be a bit inefficient for sampling because GaussianProcessRegressor uses np.random.multivariate_normal which uses an SVD-based method for sampling. You might want to sample yourself by setting the method to eigh or cholesky.

ForestsKing commented 4 months ago

Thanks very much!