SalesforceAIResearch / uni2ts

[ICML2024] Unified Training of Universal Time Series Forecasting Transformers
Apache License 2.0
611 stars 49 forks source link

Using covariates for inference #25

Closed chmanoj closed 1 month ago

chmanoj commented 2 months ago

Hi. Thanks for open sourcing the model. We've been trying to use the model on datasets with covariate. Would like to clarify the following -

  1. Are we using the model correctly, especially the covariate specification? (we add covariates using the feat_dynamic_real in gluonTS.dataset.pandas.PandasDataset)
  2. Is the result expected? (We also tried some more examples like y = x1 + x2 where x1, x2 are random gaussian or sinusoidal function and see similar results)
  3. Any suggestion on how can we improve the prediction of Moirai model?

An example with the synthetic data is below. This is a very simple case where the target is same as the covariate i.e. y = i1, where i1 is a known variable. We are trying to pass i1 using the feat_dynamic_real in the PandasDataset used in the examples in the repo.

From the predicted vs. actual plot and MAE (=0.5), we can see that the prediction is very similar to predicting 0 all the time. moirai_pred_with_features moirai_pred_MAE_with_covriates

When we repeat the same experiment without covariates, the predicted y is worse with MAE 0.66. It looks like the model is predicting a mean-reverting signal, where y_t+1 lies between y_t and 0.

So we can see that adding a covariate helps lower the MAE, but it does not do better than predicting the mean within the context.


# Code to generate the plots and MAE statistics in Jupyter notebook
import torch
import matplotlib.pyplot as plt
import pandas as pd
from huggingface_hub import hf_hub_download
from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.split import split
from uni2ts.model.moirai import MoiraiForecast
import numpy as np
from tqdm import tqdm
from IPython.display import display, HTML

# Create synthetic dataset
syn_len = 252*10
train_len = 252*9
i1 = np.array([[1]*1 + [0]*1]*5000).flatten()[-syn_len:] - 0.5
syn_df = pd.DataFrame(
    np.c_[
        i1,
        i1
    ],
    columns=["i1", "y"]
)
syn_df.index = pd.date_range(end="2024-03-31", freq="D", periods=syn_len)
syn_df.index.name="date"

# Set MOIRAI parameters
SIZE = "small"  # model size: choose from {'small', 'base', 'large'}
PDT = 1  # prediction length: any positive integer
CTX = 252  # context length: any positive integer
PSZ = "auto"  # patch size: choose from {"auto", 8, 16, 32, 64, 128}
BSZ = 32  # batch size: any positive integer
TEST = 252  # test set length: any positive integer
window_distance = 1 # PDT for non-overlapping windows
n_windows = (TEST - PDT) // window_distance # TEST // PDT if window

# Run inference using two setups
#    - with 1 covariate
#    - without any covariates

target_cols = ["y"]

for CTX in [252]:
    display(HTML(f"<h3>CTX={CTX}</h3>"))
    for feature_cols in [["i1"], []]:
        # Convert into GluonTS dataset
        ds = PandasDataset(syn_df, target=target_cols, feat_dynamic_real=feature_cols)

        # Split into train/test set
        train, test_template = split(
            ds,
            offset=-TEST
        )  # assign last TEST time steps as test set

        # Construct rolling window evaluation
        test_data = test_template.generate_instances(
            prediction_length=PDT,  # number of time steps for each prediction
            windows=n_windows,  # number of windows in rolling window evaluation
            distance=window_distance,  # number of time steps between each window - distance=PDT for non-overlapping windows
        )

        moirai_syn_preds = {}

        for MOIRAI_SIZE in ["small"]:
            model = MoiraiForecast.load_from_checkpoint(
                checkpoint_path=hf_hub_download(
                    repo_id=f"Salesforce/moirai-R-{MOIRAI_SIZE}", filename="model.ckpt"
                ),
                prediction_length=PDT,
                context_length=CTX,
                patch_size=PSZ,
                num_samples=500,
                target_dim=2,#len(ds),
                feat_dynamic_real_dim=ds.num_feat_dynamic_real,
                past_feat_dynamic_real_dim=ds.num_past_feat_dynamic_real,
                map_location="cuda:0" if torch.cuda.is_available() else "cpu",
            )

            predictor = model.create_predictor(batch_size=BSZ)
            forecasts = predictor.predict(test_data.input)

            input_it = iter(test_data.input)
            label_it = iter(test_data.label)
            forecast_it = iter(forecasts)

            forecast_out = []
            forecast_vals = []
            forecast_dates = []
            for _ in tqdm(range(test_data.windows)):
                # Make predictions
                inp = next(input_it)
                label = next(label_it)
                forecast = next(forecast_it)
                forecast_out.append(forecast)

            tmp_moirai_preds = pd.DataFrame([[x.quantile(0.5)[0], x.start_date.start_time.date()] for x in forecast_out], 
                                            columns=["y_pred_moirai", "date"]).set_index("date")
            moirai_syn_preds[MOIRAI_SIZE] = tmp_moirai_preds.copy()

        plot_df = syn_df.copy()

        for MOIRAI_SIZE in ["small"]:
            tag = "_" + MOIRAI_SIZE[0].upper()
            plot_df = pd.merge(plot_df, moirai_syn_preds[MOIRAI_SIZE].add_suffix(tag), left_index=True, right_index=True, how="left")
            plot_df[f"err_moirai{tag}"] = plot_df[f"y_pred_moirai{tag}"] - plot_df["y"]
            plot_df[f"err_moirai{tag}_abs"] = plot_df[f"err_moirai{tag}"].abs()

        display(HTML(f"<h3>feature_cols={feature_cols}</h3>"))
        plot_df[["y", "y_pred_moirai_S"]].dropna().plot(figsize=(14, 4), lw=1.0)
        plt.show()

        display(plot_df.dropna(subset=["y_pred_moirai_S"], how="any")[["err_moirai_S_abs"]].describe())
gorold commented 2 months ago

I believe it should be ds = PandasDataset(syn_df, target="y", feat_dynamic_real=feature_cols) and target_dim=1. You could also try the large model and longer context length