amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
https://arxiv.org/abs/2403.07815
Apache License 2.0
2.02k stars 238 forks source link

How to use the pre-trained or fine-tuned model for high-frequency and long-term data? #108

Closed XiaoqZhang closed 2 weeks ago

XiaoqZhang commented 2 weeks ago

Hello, I am interested in using this model for predicting high-frequency (1s) and long-term (1e6 to 5e7 s) data. I fine-tuned the chronos-t5-mini model with the configuration below:

training_data_paths:
- "<path_to_my_arrow_file>"
probability:
- 1.0
context_length: 512
prediction_length: 64
min_past: 60
max_steps: 200_000
save_steps: 100_000
log_steps: 500
per_device_train_batch_size: 32
learning_rate: 0.001
optim: adamw_torch_fused
num_samples: 20
shuffle_buffer_length: 100_000
gradient_accumulation_steps: 1
model_id: google/t5-efficient-mini
model_type: seq2seq
random_init: true
tie_embeddings: true
output_dir: ./output/
tf32: true
torch_compile: true
tokenizer_class: "MeanScaleUniformBins"
tokenizer_kwargs:
  low_limit: -15.0
  high_limit: 15.0
n_tokens: 4096
lr_scheduler_type: linear
warmup_ratio: 0.0
dataloader_num_workers: 1
max_missing_prop: 0.9
use_eos_token: true

As the model is suggested to predict 64 timesteps at most every time, I made the model predict 64 steps and then used the predictions as context and asked for the next 64 predictions. I found that the predictions performed quite well in the first 6 rounds. Since the 7th round, the amplitude of the predictions dropped largely and the predictions converged to 0 as the plot shown below. Even though the model can perform quite well until around 1000 steps, which is far from the length that I need, I would like to ask if you have tested any case like this or do you have any suggestions? I have thought about fine-tuning the model with large context and prediction length. But it cannot solve the fundamental problem due to the limitation of the GPU memory.

image

CoCoNuTeK commented 2 weeks ago

Well you have context len of 512 and pred_len of 64 and as you shift the byt 64 so the original data taht got you the first 64 prediction gets shited away cant it be that you are inferencing on data the model has been fine tuned on? and as you shift this data away the model starts performing worse

XiaoqZhang commented 2 weeks ago

I was fine-tuning using this data but up to 1e6 steps. I am new to this model. May I ask if the fine-tuning will loop over all the context windows or just the first window?

CoCoNuTeK commented 2 weeks ago

It depends on what you put into your .arrow file that you referenced here training_data_paths: in the yaml config file

XiaoqZhang commented 2 weeks ago

My input .arrow file contains several time serial data with length of 1e6s on different objects.

abdulfatir commented 2 weeks ago

@XiaoqZhang The training script will sample random cuts from your time series during training, so you don't really need to worry about the lengths of the time series. They can be of any length.

Regarding the plot above, I am not exactly sure what is going on. I have a couple of questions: