Why was there a number of tokens reduction for these chronos models compared to the t5 models?

amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting

https://arxiv.org/abs/2403.07815

Apache License 2.0

2.02k stars 238 forks source link

Why was there a number of tokens reduction for these chronos models compared to the t5 models? #123

Closed CoCoNuTeK closed 1 week ago

CoCoNuTeK commented 1 week ago

Hello there, I would like to ask why was there the reduction to only 4096 params from the model it was built from? And if i have the compute wouldnt I be better of using the original model chronos was based on, given the number of tokens? However i am guessing it would just be an empty model right, but the pro would be i could use covariates perhaps? Thanks for answering.