[x] I provided code that demonstrates a minimal reproducible example.
[x] I confirmed bug exists on the latest mainline of AutoGluon via source install.
[x] I confirmed bug exists on the latest stable version of AutoGluon.
Describe the bug
I have a time series dataset with a length of one year and a resolution of one minute (so 3652460 datapoints). If I fit a TimeSeriesPredictor to forecast a window with prediction_length=15, training is very fast. But I want to evaluate it on a full day (keeping the prediction_length=15), so I set num_val_windows=24*4 but also refit_every_n_windows=None to keep a similar training runtime (since models will only be trained once and then tested on all validation windows). In the second case however, training takes vastly longer, which I wouldn't expect. Is this a bug?
Essentially, the refit_every_n_windows option doesn't seem to do anything for this dataset. Interestingly though, I've also tested this for a dataset with only a day of information (in minute resolution, so 24*60 datapoints), here refit_every_n_windows=None did seem to reduce training runtime significantly, essentially giving the same training runtime as when num_val_windows=1.
Expected behavior
I expect a test with refit_every_n_windows=None to take roughly the same time for the same dataset, no matter the size of num_val_windows.
To Reproduce
import pandas as pd
import numpy as np
from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor
df = pd.DataFrame(
{
"timestamp": pd.date_range("2023-01-01", "2024-01-01", freq="min", inclusive="left"),
"target": np.sin(np.arange(365*24*60)),
"item_id": ["item_one"]*365*24*60,
}
)
train_data = TimeSeriesDataFrame.from_data_frame(
df,
id_column="item_id",
timestamp_column="timestamp"
)
# this will train fast since num_val_windows=1
predictor1 = TimeSeriesPredictor(
prediction_length=15,
path="refit_every_n_windows_test",
target="target",
eval_metric="MAE",
)
predictor1.fit(
train_data,
presets="fast_training",
num_val_windows=1,
)
# this is obviously much slower since num_val_windows=24*4
predictor2 = TimeSeriesPredictor(
prediction_length=15,
path="refit_every_n_windows_test",
target="target",
eval_metric="MAE",
)
predictor2.fit(
train_data,
presets="fast_training",
num_val_windows=24*4,
)
# I'd expect this to be as fast as predictor1 (at least in training time), but it's more like predictor 2 (much slower)
predictor3 = TimeSeriesPredictor(
prediction_length=15,
path="refit_every_n_windows_test",
target="target",
eval_metric="MAE",
)
predictor3.fit(
train_data,
presets="fast_training",
num_val_windows=24*4,
refit_every_n_windows=None,
)
Screenshots / Logs
OUTPUT FOR predictor1:
Beginning AutoGluon training...
AutoGluon will save models to 'refit_every_n_windows_test'
=================== System Info ===================
AutoGluon Version: 1.1.1
Python Version: 3.10.13
Operating System: Windows
Platform Machine: AMD64
Platform Version: 10.0.22631
CPU Count: 12
GPU Count: 0
Memory Avail: 1.08 GB / 15.69 GB (6.9%)
Disk Space Avail: 101.43 GB / 235.67 GB (43.0%)
===================================================
Setting presets to: fast_training
Fitting with arguments:
{'enable_ensemble': True,
'eval_metric': MAE,
'hyperparameters': 'very_light',
'known_covariates_names': [],
'num_val_windows': 1,
'prediction_length': 15,
'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
'random_seed': 123,
'refit_every_n_windows': 1,
'refit_full': False,
'skip_model_selection': False,
'target': 'target',
'verbosity': 2}
Inferred time series frequency: 'min'
Provided train_data has 525600 rows, 1 time series. Median time series length is 525600 (min=525600, max=525600).
Provided data contains following columns:
target: 'target'
AutoGluon will gauge predictive performance using evaluation metric: 'MAE'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
===================================================
Starting training. Start time is 2024-09-06 11:39:32
Models that will be trained: ['Naive', 'SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'ETS', 'Theta']
Training timeseries model Naive.
-1.0222 = Validation score (-MAE)
0.36 s = Training runtime
0.09 s = Validation (prediction) runtime
Training timeseries model SeasonalNaive.
-0.7292 = Validation score (-MAE)
0.34 s = Training runtime
0.09 s = Validation (prediction) runtime
Training timeseries model RecursiveTabular.
-0.0006 = Validation score (-MAE)
36.15 s = Training runtime
2.41 s = Validation (prediction) runtime
Training timeseries model DirectTabular.
-0.0008 = Validation score (-MAE)
28.96 s = Training runtime
0.53 s = Validation (prediction) runtime
Training timeseries model ETS.
-0.6342 = Validation score (-MAE)
0.31 s = Training runtime
0.11 s = Validation (prediction) runtime
Training timeseries model Theta.
-0.9614 = Validation score (-MAE)
0.31 s = Training runtime
0.20 s = Validation (prediction) runtime
Fitting simple weighted ensemble.
Ensemble weights: {'DirectTabular': 0.62, 'RecursiveTabular': 0.38}
-0.0005 = Validation score (-MAE)
0.39 s = Training runtime
2.94 s = Validation (prediction) runtime
Training complete. Models trained: ['Naive', 'SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'ETS', 'Theta', 'WeightedEnsemble']
Total runtime: 70.57 s
Best model: WeightedEnsemble
Best model score: -0.0005
OUTPUT FOR predictor2 (I never managed to finish it, seemed to get stuck on RecursiveTabular):
Beginning AutoGluon training...
AutoGluon will save models to 'refit_every_n_windows_test'
=================== System Info ===================
AutoGluon Version: 1.1.1
Python Version: 3.10.13
Operating System: Windows
Platform Machine: AMD64
Platform Version: 10.0.22631
CPU Count: 12
GPU Count: 0
Memory Avail: 2.95 GB / 15.69 GB (18.8%)
Disk Space Avail: 100.57 GB / 235.67 GB (42.7%)
===================================================
Setting presets to: fast_training
Fitting with arguments:
{'enable_ensemble': True,
'eval_metric': MAE,
'hyperparameters': 'very_light',
'known_covariates_names': [],
'num_val_windows': 96,
'prediction_length': 15,
'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
'random_seed': 123,
'refit_every_n_windows': 1,
'refit_full': False,
'skip_model_selection': False,
'target': 'target',
'verbosity': 2}
Inferred time series frequency: 'min'
Provided train_data has 525600 rows, 1 time series. Median time series length is 525600 (min=525600, max=525600).
Provided data contains following columns:
target: 'target'
AutoGluon will gauge predictive performance using evaluation metric: 'MAE'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
===================================================
Starting training. Start time is 2024-09-06 14:35:41
Models that will be trained: ['Naive', 'SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'ETS', 'Theta']
Training timeseries model Naive.
-0.8330 = Validation score (-MAE)
52.87 s = Training runtime
0.08 s = Validation (prediction) runtime
Training timeseries model SeasonalNaive.
-0.6931 = Validation score (-MAE)
38.15 s = Training runtime
0.08 s = Validation (prediction) runtime
Training timeseries model RecursiveTabular.
OUTPUT FOR predictor3:
Beginning AutoGluon training...
AutoGluon will save models to 'refit_every_n_windows_test'
=================== System Info ===================
AutoGluon Version: 1.1.1
Python Version: 3.10.13
Operating System: Windows
Platform Machine: AMD64
Platform Version: 10.0.22631
CPU Count: 12
GPU Count: 0
Memory Avail: 2.34 GB / 15.69 GB (14.9%)
Disk Space Avail: 100.90 GB / 235.67 GB (42.8%)
===================================================
Setting presets to: fast_training
Fitting with arguments:
{'enable_ensemble': True,
'eval_metric': MAE,
'hyperparameters': 'very_light',
'known_covariates_names': [],
'num_val_windows': 96,
'prediction_length': 15,
'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
'random_seed': 123,
'refit_full': False,
'skip_model_selection': False,
'target': 'target',
'verbosity': 2}
Inferred time series frequency: 'min'
Provided train_data has 525600 rows, 1 time series. Median time series length is 525600 (min=525600, max=525600).
Provided data contains following columns:
target: 'target'
AutoGluon will gauge predictive performance using evaluation metric: 'MAE'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
===================================================
Starting training. Start time is 2024-09-06 12:14:39
Models that will be trained: ['Naive', 'SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'ETS', 'Theta']
Training timeseries model Naive.
-0.8330 = Validation score (-MAE)
38.34 s = Training runtime
0.19 s = Validation (prediction) runtime
Training timeseries model SeasonalNaive.
-0.6931 = Validation score (-MAE)
32.74 s = Training runtime
0.10 s = Validation (prediction) runtime
Training timeseries model RecursiveTabular.
-0.0003 = Validation score (-MAE)
292.69 s = Training runtime
2.82 s = Validation (prediction) runtime
Training timeseries model DirectTabular.
-0.0007 = Validation score (-MAE)
75.64 s = Training runtime
0.43 s = Validation (prediction) runtime
Training timeseries model ETS.
-0.6369 = Validation score (-MAE)
139.11 s = Training runtime
0.12 s = Validation (prediction) runtime
Training timeseries model Theta.
-0.8383 = Validation score (-MAE)
87.63 s = Training runtime
0.21 s = Validation (prediction) runtime
Fitting simple weighted ensemble.
Ensemble weights: {'DirectTabular': 0.08, 'RecursiveTabular': 0.92}
-0.0003 = Validation score (-MAE)
47.86 s = Training runtime
3.25 s = Validation (prediction) runtime
Training complete. Models trained: ['Naive', 'SeasonalNaive', 'RecursiveTabular', 'DirectTabular', 'ETS', 'Theta', 'WeightedEnsemble']
Total runtime: 738.78 s
Best model: WeightedEnsemble
Best model score: -0.0003
Bug Report Checklist
Describe the bug I have a time series dataset with a length of one year and a resolution of one minute (so 3652460 datapoints). If I fit a
TimeSeriesPredictor
to forecast a window withprediction_length=15
, training is very fast. But I want to evaluate it on a full day (keeping theprediction_length=15
), so I setnum_val_windows=24*4
but alsorefit_every_n_windows=None
to keep a similar training runtime (since models will only be trained once and then tested on all validation windows). In the second case however, training takes vastly longer, which I wouldn't expect. Is this a bug?Essentially, the
refit_every_n_windows
option doesn't seem to do anything for this dataset. Interestingly though, I've also tested this for a dataset with only a day of information (in minute resolution, so 24*60 datapoints), hererefit_every_n_windows=None
did seem to reduce training runtime significantly, essentially giving the same training runtime as whennum_val_windows=1
.Expected behavior I expect a test with
refit_every_n_windows=None
to take roughly the same time for the same dataset, no matter the size ofnum_val_windows
.To Reproduce
Screenshots / Logs
Installed Versions