amazon-science / chronos-forecasting

Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
https://arxiv.org/abs/2403.07815
Apache License 2.0
2.02k stars 238 forks source link

How to reproduce the results as shown in the paper? #102

Open liu-jc opened 2 weeks ago

liu-jc commented 2 weeks ago

Hi chronos team,

Thanks for the great work! I would like to know how we can reproduce the results as shown in the paper, e.g., Figure 4. Also could we have some evaluation scripts/code to facilitate the model evaluation?

I am aware that some code snippets are provided at https://github.com/amazon-science/chronos-forecasting/issues/75. But as mentioned "While many datasets in GluonTS have the same name as the ones used in the paper, they may be different from the evaluation in the paper in crucial aspects such as prediction length and number of rolls.", I wonder if we can have scripts to help us reproduce the results.

abdulfatir commented 2 weeks ago

@liu-jc We are working towards releasing the evaluation datasets. Once we have that, I will inform you. Please keep an eye out for the update.

liu-jc commented 1 week ago

Hi @abdulfatir, thanks for the reply! I also noticed that in the README, you mentioned "Fixed an off-by-one error in bin indices in the output_transform". Does that mean if we are using the checkpoint on Huggingface, it's the version before fixing this bug?

abdulfatir commented 1 week ago

@liu-jc The issue was not in the model checkpoints themselves but in the inference code. The decodes values were shifted by one which led to some avoidable discrepancy.

liu-jc commented 1 week ago

@abdulfatir thanks for the answer. So, may I confirm that if we are using the latest code for inference, it should not have any problems?

abdulfatir commented 1 week ago

@liu-jc Yes.

abdulfatir commented 1 day ago

Update: We have just open-sourced the datasets used in the paper (thanks @shchur!). Please check the updated README. We have also released an evaluation script and backtest configs to compute the WQL and MASE numbers as reported in the paper. Please follow the instructions in this README to evaluate on the in-domain and zero-shot benchmarks.