decisionintelligence / TFB

TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods (PVLDB 2024) https://www.vldb.org/pvldb/vol17/p2363-hu.pdf
MIT License
211 stars 20 forks source link

Empty results on univariate dataset scripts #14

Open stefanosh opened 2 weeks ago

stefanosh commented 2 weeks ago

Hello, thanks a lot for your effort into this comprehensive forecasting benchmark.

I tried to run for univariate dataset "weather_dataset_3.csv" (downloaded from your google drive link) but I get empty results. For example, running

python ./scripts/run_benchmark.py --config-path "fixed_forecast_config_daily.json" --data-name-list "weather_dataset_3.csv" --model-name "time_series_library.Triformer" --model-hyper-params "{\"d_model\":16,\"d_ff\":16,\"factor\":3}" --adapter "transformer_adapter" --save-path "daily" --gpus 0 1 --num-workers 1 --timeout 60000

trains successfully the model with no errors and creates the zip file. However, when unzipping the test .csv file it is empty without metric results (see figures below).

For rolling_forecast as below, I get all the results. python ./scripts/run_benchmark.py --config-path "rolling_forecast_config.json" --data-name-list "Weather.csv" --strategy-args '{"horizon":96}' --model-name "time_series_library.Linear" --model-hyper-params '{"d_ff": 64, "d_model": 32, "horizon": 96, "seq_len": 512}' --adapter "transformer_adapter" --gpus 0 --num-workers 1 --timeout 60000 --save-path "Weather/Linear"

1) What can be the cause for this? 2) Besides this issue, what are your recommendations/guidelines for running the benchmark on a custom dataset? e.g., a) data format, b) config file update, c) hyperparameter tuning

Thanks.

image

image

qiu69 commented 2 weeks ago

python ./scripts/run_benchmark.py --config-path "fixed_forecast_config_daily.json" --data-name-list "weather_dataset_3.csv" --model-name "time_series_library.Triformer" --model-hyper-params "{\"d_model\":16,\"d_ff\":16,\"factor\":3}" --adapter "transformer_adapter" --save-path "daily" --gpus 0 1 --num-workers 1 --timeout 60000 Thank you for your attention and question. I ran the same command, but I got the result as follows: Please ensure that you have used the correct dataset and the latest code. Perhaps your first question will be resolved.

image image
qiu69 commented 2 weeks ago

For the second question, we will provide a dedicated tutorial later on. You can now follow the following tutorial to use your own dataset. Step 1: Convert your own dataset to our format, which is a 3-column long table. You can open a multivariate dataset or a univariate dataset to view it. Step 2: Add your dataset information to the FORECAST-META.csv file. Please note that you only need to fill in attributes such as filename, freq, if_univariant, size, and length. Please fill in the size as large, and the length is the original length of your dataset, not the length of the dataset converted to our format. Step 3: Specify your own dataset name in the command, such as: python ./scripts/run_benchmark.py --config-path "fixed_forecast_config_daily.json" --data-name-list "your dataset name" --model-name "time_series_library.Triformer" --model-hyper-params "{"d_model":16,"d_ff":16,"factor":3}" --adapter "transformer_adapter" --save-path "daily" --gpus 0 1 --num-workers 1 --timeout 60000

stefanosh commented 2 weeks ago

Thanks for the answer and the tutorial.

I confirm the results now.

For univariate datasets
a) do you suggest keeping the same config values in the existing config files, b) should rolling_forecast be used as well?

qiu69 commented 2 weeks ago

a) If you want to compare the performance with the univariate algorithm in the paper, you should follow the configuration in the existing configuration file and not modify the forecasting horizon and look back window size. The hyperparameters of the algorithms themselves, such as training epochs and lr, can be modified. b) Whether to perform rolling forecast is another usage scenario and also reflects another ability of the algorithm. Maybe you can decide according to your needs.

stefanosh commented 2 weeks ago

Thanks.

The training epochs and LR values currently in the scripts are the ones used in the papers right?

Also, was training performed with multiple seeds (by changing seed in strategy_args) or with seed 2021?

qiu69 commented 2 weeks ago

For each algorithm on a dataset, we ran multiple sets of hyperparameters and reported the performance results under the best parameters in the paper. The univariate scripts provide examples of batch evaluating univariate datasets. We suggest that you use the settings in the config file, including seed. You can also write your own config file according to your needs.