Open aleksmaksimovic opened 19 hours ago
Thank you opening this, although this is not exactly a bug. When it comes to fine-tuning there is no one-fits-all set of hyperparameters that would work for all datasets. Therefore, it is completely possible that due to specific settings such as a large learning_rate
or max_steps
, the model's performance worsens upon fine-tuning (e.g., due to over-fitting). I would encourage you to try different fine-tuning hyperparameters.
Describe the bug
When finetuning chronos-t5-small on the ETTh1 dataset and ETTh2 dataset respectively, the performance drops compared to the zeroshot performance. Could that be the case because the
prediction_length
is recommended to be <=64?Expected behavior
If the model chronos-t5-small is finetuned on let's say the dataset ETTh1 only, the finetuned model should yield superior MAE and MSE performance compared to the zeroshot model.
How to reproduce
This example is focused on the ETTh1 dataset. For the ETTh2 dataset, the procedure is identical. Please note that the finetuning evaluation for my experiments is done individually for both datasets, so the model is not finetuned and evaluated on both datasets at once.
if name == "main":
Load and preprocess the dataset
Use training pipeline implemented in file
chronos-forecasting/scripts/training/train.py
shown in the tutorial with the following configchronos-t5-small.yaml
:chronos-forecasting/scripts/evaluation/evaluate.py
with the following modifications inthe load_and_split_dataset()
function:and following metrics:
The evaluation is performed on the test section of the standardized ETTh1 dataset, hence the offset. For the evaluation pipeline, use the following config:
I get the following results:
Zeroshot ETTh1 dataset,model,MAE[0.5],MSE[mean] ETTh,amazon/chronos-t5-small,0.5081954018184918,0.560689815315581
Zeroshot ETTh2 dataset,model,MAE[0.5],MSE[mean] ETTh,amazon/chronos-t5-small,0.2625630043626757,0.1391419442914831
Finetuned ETTh1 dataset,model,MAE[0.5],MSE[mean] ETTh,/path/to/checkpoint-final,0.7746078180721628,1.1865953634689008
Finetuned ETTh2 dataset,model,MAE[0.5],MSE[mean] ETTh,/path/to/checkpoint-final,0.35415831866543424,0.2516080298962922
As you can see, MAE and MSE are worse for the finetuned checkpoint than for the default model. That shouldn't be the case.
Environment description Operating system: Ubuntu 22.04.4 LTS Python version: 3.10.14 CUDA version: 12.4 PyTorch version: 2.4.0 HuggingFace transformers version: 4.44.2 HuggingFace accelerate version: 0.33.0