QData / spacetimeformer

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."
https://arxiv.org/abs/2109.12218
MIT License
744 stars 178 forks source link

Spacetimeformer Multivariate Forecasting

This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecasting", Grigsby et al., 2021. (arXiv).

Spacetimeformer is a Transformer that learns temporal patterns like a time series model and spatial patterns like a Graph Neural Network.

Below we give a brief explanation of the problem and method with installation instructions. We provide training commands for high-performance results on several datasets.

NEW MARCH 2023! We have updated the public version of the paper to v3 - the final major update expected. See the v3 release notes below.

Data Format

We deal with multivariate sequence to sequence problems that have continuous inputs. The most common example is time series forecasting where we make predictions at future ("target") values given recent history ("context"):

Every model and dataset uses this x_context, y_context, x_target, y_target format. X values are time covariates like the calendar datetime, while Ys are variable values. There can be additional context variables that are not predicted.

Spatiotemporal Attention

Typical deep learning time series models group Y values by timestep and learn patterns across time. When using Transformer-based models, this results in "temporal" attention networks that can ignore spatial relationships between variables.

In contrast, Graph Neural Networks and similar methods model spatial relationships with explicit graphs - sharing information across space and time in alternating layers.

Spactimeformer learns full spatiotemporal patterns between all varibles at every timestep.

We implement spatiotemporal attention with a custom Transformer architecture and embedding that flattens multivariate sequences so that each token contains the value of a single variable at a given timestep:

Spacetimeformer processes these longer sequences with a mix of efficient attention mechanisms and Vision-style "windowed" attention.

This repo contains the code for our model as well as several high-quality baselines for common benchmarks and toy datasets.

Paper v3 Release Notes

The Spacetimeformer project began in 2021. The project underwent a major revision in summer 2022, with most of the updates being merged to the public codebase shortly thereafter. However, the updated version of the paper was not released until March 2023. Here we summarize the major changes:

Installation and Training

This repository was written and tested for python 3.8 and pytorch 1.11.0. Note that the training process depends on specific (now outdated) versions of pytorch lightning and torchmetrics.

git clone https://github.com/QData/spacetimeformer.git
cd spacetimeformer
conda create -n spacetimeformer python==3.8
conda activate spacetimeformer
pip install -r requirements.txt
pip install -e .

This installs a python package called spacetimeformer. The package does not install pytorch or torchvision automatically, and you should follow the official pytorch installation instructions for 1.11 depending on your CUDA software version.

Commandline instructions for each experiment can be found using the format: python train.py *model* *dataset* -h.

Models

Datasets

Spatial Forecasting
Time Series Forecasting
Image Completion
Copy Tasks
"Global" or Multiseries Datasets

Logging with Weights and Biases

We used wandb to track all of results during development, and you can do the same by providing your username and project as environment variables:

export STF_WANDB_ACCT="your_username"
export STF_WANDB_PROJ="your_project_title"
# optionally: change wandb logging directory (defaults to ./data/STF_LOG_DIR)
export STF_LOG_DIR="/somewhere/with/more/disk/space"

wandb logging can then be enabled with the --wandb flag.

There are several figures that can be saved to wandb between epochs. These vary by dataset but can be enabled with --attn_plot (for Transformer attention diagrams) and --plot (for prediction plotting and image completion).

Example Training Commands

General Notes:
  1. Commands are listed without GPU counts. For one GPU, add --gpus 0, three GPUs: --gpus 0 1 2 etc. Some of these models require significant GPU memory (A100 80GBs). Other hyperparameter settings were used in older versions of the paper with more limited compute resources. If I have time I will try to update with competetive alternatives on smaller GPUs.

  2. Some datasets require a --data_path to the dataset location on disk. Others are included with the source code or downloaded automatically.

Linear autoregressive model with independent weights and seasonal decomposotion (DLinear-style) on ETTm1:

python train.py linear ettm1 --context_points 288 --target_points 96 --run_name linear_ettm1_regression --gpus 0 --use_seasonal_decomp --linear_window 288 --data_path /path/to/ETTm1.csv

Spacetimeformer on Pems-Bay (MAE: ~1.61):

python train.py spacetimeformer pems-bay --batch_size 32 --warmup_steps 1000 --d_model 200 --d_ff 700 --enc_layers 5 --dec_layers 6 --dropout_emb .1 --dropout_ff .3 --run_name pems-bay-spatiotemporal --base_lr 1e-3 --l2_coeff 1e-3 --loss mae --data_path /path/to/pems_bay/ --d_qk 30 --d_v 30 --n_heads 10 --patience 10 --decay_factor .8

Spacetimeformer on MNIST completion:

python train.py spacetimeformer mnist --embed_method spatio-temporal --local_self_attn full --local_cross_attn full --global_self_attn full --global_cross_attn full --run_name mnist_spatiotemporal --context_points 10

Spacetimeformer on AL Solar (MSE: ~7.75):

python train.py spacetimeformer solar_energy --context_points 168 --target_points 24 --d_model 100 --d_ff 400 --enc_layers 5 --dec_layers 5 --l2_coeff 1e-3 --dropout_ff .2 --dropout_emb .1 --d_qk 20 --d_v 20 --n_heads 6 --run_name spatiotemporal_al_solar --batch_size 32 --class_loss_imp 0 --initial_downsample_convs 1 --decay_factor .8 --warmup_steps 1000

More Coming Soon...

Citation

If you use this model in academic work please feel free to cite our paper

@misc{grigsby2021longrange,
      title={Long-Range Transformers for Dynamic Spatiotemporal Forecasting}, 
      author={Jake Grigsby and Zhe Wang and Yanjun Qi},
      year={2021},
      eprint={2109.12218},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}