Training and fine tuning protocols

amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting

Apache License 2.0

2.02k stars 238 forks source link

@tanz63 these configurations are more or less the defaults from torch and transformers. The 200K steps were set based on the visual inspection of the loss curve (although we did see improved performance with longer training as shown in the hyperparameter analysis).

Regarding fine-tuning, it's a dataset-agnostic proof of concept. We did not deliberate hard about the setting. One could potentially obtain significantly better fine-tuning performance by carefully validating the hyperparameters.

0.001 is a commonly-used default for the learning rate.
linear annealing is the default in transformers.
1000 felt like a good number. ;)

amazon-science / chronos-forecasting

Training and fine tuning protocols #89