The amount of pretraining data

Hi @abdulfatir , I would like to ask how many data chronos used for pretraining and how to get the numbers stated in the paper. As mentioned in Sec 5.1, 890K univariate time series are used (in total roughly 84B tokens). Are these numbers for original raw data but not directly for training?

In Sec 5.2, it states 10M TSMixup augmentations and 1M synthetic time series (so total 11M time series). In my view, these numbers are the data used for training. Am I correct?

Besides, I would like to ask how many observations/tokens are used for training given that we are using 11M time series? I notices that for synthetic time series have 1024 tokens each as shown in https://github.com/amazon-science/chronos-forecasting/blob/main/scripts/kernel-synth.py. But how about TSMixup augmentations?

amazon-science / chronos-forecasting

The amount of pretraining data #130