amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
https://arxiv.org/abs/2403.07815
Apache License 2.0
2.02k stars 238 forks source link

The amount of pretraining data #130

Closed liu-jc closed 3 days ago

liu-jc commented 4 days ago

Hi @abdulfatir , I would like to ask how many data chronos used for pretraining and how to get the numbers stated in the paper. As mentioned in Sec 5.1, 890K univariate time series are used (in total roughly 84B tokens). Are these numbers for original raw data but not directly for training?

In Sec 5.2, it states 10M TSMixup augmentations and 1M synthetic time series (so total 11M time series). In my view, these numbers are the data used for training. Am I correct?

Besides, I would like to ask how many observations/tokens are used for training given that we are using 11M time series? I notices that for synthetic time series have 1024 tokens each as shown in https://github.com/amazon-science/chronos-forecasting/blob/main/scripts/kernel-synth.py. But how about TSMixup augmentations?