Different steps - Githubissues

yuye188 commented 2 weeks ago

I have trained 2 models using bayesian_search_forecaster_multiseries() and backtesting_forecaster_multiseries() with the same algorithm (LGBMRegressor) and search_space, the difference is one with steps=1 and another with steps=24 (hourly frequency). It turns out that the 24h model is trained much faster and gives almost the same results as the 1h ones when I do predict(step=1) on the last data received in a production environment. Is this normal or could it be that I made a mistake in some step?

Many thanks!

JoaquinAmatRodrigo commented 2 weeks ago

Hi @yuye188, What do you mean by "I trained 2 models using bayesian_search_forecaster_multiseries() and backtesting_forecaster_multiseries()"? The first function does a multi-parameter search and returns the best combination found, the second function does a backtesting process given a forecaster. They are two different things. The Bayesian search is expected to take much more time than backtesting.

yuye188 commented 2 weeks ago

Hi @yuye188, What do you mean by "I trained 2 models using bayesian_search_forecaster_multiseries() and backtesting_forecaster_multiseries()"? The first function does a multi-parameter search and returns the best combination found, the second function does a backtesting process given a forecaster. They are two different things. The Bayesian search is expected to take much more time than backtesting.

Hi! Yes, exactly. What I mean is that I'm using bayesian seach together with backtesting in a function that I've created to get a model. Here's my code:

...
# Modelo (Forecaster)
forecaster_ms = ForecasterAutoregMultiSeries(
             regressor = LGBMRegressor(random_state=123, verbose=-1),
             lags      = 14,
             encoding  = 'ordinal',
             transformer_series=RobustScaler(),
             weight_func=custom_weights
         )

# Bayesian Search
results, best_trial = bayesian_search_forecaster_multiseries(
                      forecaster            = forecaster_ms,
                      series                = df_interpolated,
                      search_space          = search_space,
                      steps                 = horizon,
                      metric                = 'mean_absolute_error',
                      refit                 = False,
                      initial_train_size    = len(df_interpolated.loc[:end_train]),
                      fixed_train_size      = True,
                      n_trials              = 20,
                      random_state          = 123,
                      return_best           = True,
                      n_jobs                = 'auto',
                      verbose               = False,
                      show_progress         = True,
                      engine                = 'optuna',
                      kwargs_create_study   = {},
                      kwargs_study_optimize = {}
                  )

# Backtesting
multi_series_mae, predictions_ms = backtesting_forecaster_multiseries(
                                    forecaster         = forecaster_ms,
                                    series             = df_interpolated,
                                    levels             = None,
                                    steps              = horizon,
                                    metric             = 'mean_absolute_error',
                                    initial_train_size = len(data_train) + len(data_val),
                                    refit              = False,
                                    fixed_train_size   = False,
                                    verbose            = False
                                )
...

Then I trained a model with horizon=1 and another model with horizon=24 and tried both models to predict with latest real data (not included in df_interpolated), and the model trained with horizon=24 has the similar performance when I use forecaster.predict(step=1). So what I don't understand is what is the difference of using step=1 and step=24 in both processes and how they can affect the model's performance.

JoaquinAmatRodrigo commented 2 weeks ago

Hi @yuye188, This is a great experiment.

To better understand the results, let me provide some context. Under the hood, the ForecasterAutoregMultiSeries trains a regressor that only knows how to predict one step ahead. When multiple steps are needed, the forecaster uses a recursive process that predicts one step at a time until it reaches the desired horizon. This means that, two ForecasterAutoregMultiSeries with the same hyperparameters, will produce exactly the same predictions.

So why backtest with different steps? The idea of backtesting is to simulate the real scenario and generate a metric that accurately represents that scenario. It is not the same to backtest a whole year by predicting only 1 day ahead every day as it is to predict the whole week every Monday.

In the Bayesian search process, we try to find the model hyperparameters that produce a lower metric for a given backtesting scenario.

In summary, in your case, if both Bayesian searches select the same model, then both forecasters are equivalent. So if you backtest them with the same number of steps, they will produce the same predictions.

Does this help?

yuye188 commented 1 week ago

Hi!

Yes, absolutely. Thanks for the explanation!

JoaquinAmatRodrigo / skforecast

Different steps #710