Hank0626 / CALF

An official implementation of "CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning"
Apache License 2.0
67 stars 9 forks source link

CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning

Introduction

CALF (Orignal name: LLaTA) is a novel cross-modal fine-tuing framework that effectively bridges the distribution discrepancy between temporal data and the textual nature of LLMs, as shown in Figure 1.

Discrepancy Image
Figure 1: The t-SNE visualization of pre-trained word token embeddings of LLM with temporal tokens from GPT4TS (Left) and our method (Right). Our method shows more cohesive integration, indicating effective modality alignment.

To bridge the modality gap between textual and temporal data, we introduce three meticulously designed cross-modal fine-tuning techniques (see Figure 2):


Figure 2: Conceptual illustration of cross-modal fine-tuning technique.

Prerequisites

Before proceeding, ensure Python 3.9 is installed. Install the required dependencies with the following command:

pip install -r requirements.txt

Dataset Preparation

Long-term Forecasting

Acquire datasets from Autoformer. Organize them in the ./datasets directory as shown below:

datasets
├── electricity
│   └── electricity.csv
├── ETT-small
│   ├── ETTh1.csv
│   ├── ETTh2.csv
│   ├── ETTm1.csv
│   └── ETTm2.csv
├── traffic
│   └── traffic.csv
└── weather
    └── weather.csv

Short-term Forecasting

For short-term forecasting, download the M4 datasets from Time-Series-Library. Place the m4 folder within ./datasets.

Preparing Word Token Embeddings

Execute the command below to extract principal components from the word token embeddings:

python pca.py

These embeddings will be saved in ./wte_pca_500.pt.

Model Training

Training scripts are located in the ./scripts folder. For instance, to train the CALF model on the ETTh2 dataset for long-term forecasting, execute:

sh scripts/long_term_forecasting/ETTh2.sh

For short-term forecasting, use:

sh scripts/short_term_forecasting/m4.sh

Post-Training:

Citation

If this repository contributes to your research, please consider citing our work:

@article{liu2024taming,
      title={CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning}, 
      author={Liu, Peiyuan and Guo, Hang and Dai, Tao and Li, Naiqi and Bao, Jigang and Ren, Xudong and Jiang, Yong and Xia, Shu-Tao},
      journal={arXiv preprint arXiv:2403.07300},
      year={2024},
      arxiv={2403.07300}
}

Acknowledgements

Our gratitude extends to the authors of the following repositories for their foundational model implementations:

Contact Us

For inquiries or further assistance, contact us at lpy23@mails.tsinghua.edu.cn or open an issue on this repository.