NuwaTS: a Foundation Model Mending Every Incomplete Time Series

NuwaTS is mending incomplete time series from different domain.

This is the pytorch implementation for Our Paper: NuwaTS
You could access the online demo through: NuwaTS-Colab

The picture is generated by [ChatGLM](https://chatglm.cn/)

Overview

Time series imputation is critical for many real-world applications and has been widely studied. However, existing models often require specialized designs tailored to specific missing patterns, variables, or domains which limits their generalizability. In addition, current evaluation frameworks primarily focus on domain-specific tasks and often rely on time-wise train/validation/test data splits, which fail to rigorously assess a model’s ability to generalize across unseen variables or domains. In this paper, we present \textbf{NuwaTS}, a novel framework that repurposes Pre-trained Language Models (PLMs) for general time series imputation. Once trained, NuwaTS can be applied to impute missing data across any domain. We introduce specialized embeddings for each sub-series patch, capturing information about the patch, its missing data patterns, and its statistical characteristics. By combining contrastive learning with the imputation task, we train PLMs to create a versatile, one-for-all imputation model. Additionally, we employ a plug-and-play fine-tuning approach, enabling efficient adaptation to domain-specific tasks with minimal adjustments. To evaluate cross-variable and cross-domain generalization, we propose a new benchmarking protocol that partitions the datasets along the variable dimension. Experimental results on over seventeen million time series samples from diverse domains demonstrate that NuwaTS outperforms state-of-the-art domain-specific models across various datasets under the proposed benchmarking protocol. Furthermore, we show that NuwaTS generalizes to other time series tasks, such as forecasting.

Key Contributions

Our contributions are as follows:

We propose a novel and more practically relevant benchmark which divides the multivariate time series data along the variable dimension for time series imputation, which better assesses a model's ability to generalize to new data.
We introduce NuwaTS, designed to handle missing data imputation tasks for any incomplete time series. NuwaTS is trained on data from diverse domains and incorporates a light-weight “plug-and-play” fine-tuning technique that requires minimal data and computational resources, making it capable of mending every incomplete time series.
Under the proposed benchmarking protocol, the one-for-all NuwaTS consistently outperforms domain-specific state-of-the-art methods in imputation tasks across nearly all missing rates. Moreover, fine-tuned NuwaTS can be extended to time series forecasting, where its forecasting results are comparable to or even better than existing domain-specific time series forecasting models.

Visualization

We partitioned the dataset along the sensor (variable) dimension into training, validation and test sets in a 1:1:1 ratio. Thus, all the methods are tested on unseen variables. Source Code and data can be found in Visualization

Data Download

You could download the dataset and checkpoint from here: Google Drive

Run

python run.py --task_name imputation --is_training 1 
--root_path ./dataset/ --data_path electricity.csv  --model NuwaTS 
--data custom --features M --seq_len 96 --label_len 0 --pred_len 0 
--enc_in 107 --dec_in 107 --c_out 107 --gpt_layer 6 
--batch_size 16 --d_model 768 --patch_size 16 
--des NuwaTS_ECL  --mlp 1 --learning_rate 0.001 
--prefix_length 1 --prefix_tuning --cov_prompt

Citation

🌟 If you find this resource helpful, please consider to star this repository and cite our research:

@article{cheng2024nuwats,
  title={NuwaTS: Mending Every Incomplete Time Series},
  author={Cheng, Jinguo and Yang, Chunwei and Cai, Wanlin and Liang, Yuxuan and Wen, Qingsong and Wu, Yuankai},
  journal={arXiv preprint arXiv:2405.15317},
  year={2024}
}

@inproceedings{liang2024foundation,
  title={Foundation models for time series analysis: A tutorial and survey},
  author={Liang, Yuxuan and Wen, Haomin and Nie, Yuqi and Jiang, Yushan and Jin, Ming and Song, Dongjin and Pan, Shirui and Wen, Qingsong},
  booktitle={ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024)},
  year={2024}
}

3, Position Paper: What Can Large Language Models Tell Us about Time Series Analysis, in ICML 2024.

Authors: Ming Jin, Yifan Zhang, Wei Chen, Kexin Zhang, Yuxuan Liang, Bin Yang, Jindong Wang, Shirui Pan, Qingsong Wen

@inproceedings{jin2024position,
   title={Position Paper: What Can Large Language Models Tell Us about Time Series Analysis}, 
   author={Ming Jin and Yifan Zhang and Wei Chen and Kexin Zhang and Yuxuan Liang and Bin Yang and Jindong Wang and Shirui Pan and Qingsong Wen},
  booktitle={International Conference on Machine Learning (ICML 2024)},
  year={2024}
}

4, Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook, in arXiv 2023. [GitHub Repo]

Authors: Ming Jin, Qingsong Wen, Yuxuan Liang, Chaoli Zhang, Siqiao Xue, Xue Wang, James Zhang, Yi Wang, Haifeng Chen, Xiaoli Li (IEEE Fellow), Shirui Pan, Vincent S. Tseng (IEEE Fellow), Yu Zheng (IEEE Fellow), Lei Chen (IEEE Fellow), Hui Xiong (IEEE Fellow)

@article{jin2023lm4ts,
  title={Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook}, 
  author={Ming Jin and Qingsong Wen and Yuxuan Liang and Chaoli Zhang and Siqiao Xue and Xue Wang and James Zhang and Yi Wang and Haifeng Chen and Xiaoli Li and Shirui Pan and Vincent S. Tseng and Yu Zheng and Lei Chen and Hui Xiong},
  journal={arXiv preprint arXiv:2310.10196},
  year={2023}
}

5, Transformers in Time Series: A Survey, in IJCAI 2023. [GitHub Repo]

Authors: Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, Liang Sun

@inproceedings{wen2023transformers,
  title={Transformers in time series: A survey},
  author={Wen, Qingsong and Zhou, Tian and Zhang, Chaoli and Chen, Weiqi and Ma, Ziqing and Yan, Junchi and Sun, Liang},
  booktitle={International Joint Conference on Artificial Intelligence(IJCAI)},
  year={2023}
}

Acknowledgement

We thank Time-LLM for inspiring us to adopt pre-trained large language models (LLM) for time series analysis.
We thank MVP-Tuning for inspiring us to fine-tune the pre-trained language model using P-tuningV2.
We thank Time-Series-Library for supporting training pipline.

Chengyui / NuwaTS

readme