JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
BSD 3-Clause "New" or "Revised" License
1k stars 113 forks source link

IndexError When lags is greater than number of steps #164

Closed hdattada closed 1 year ago

hdattada commented 2 years ago

I am using Skforecast for the first time and I am having trouble forecasting steps which is larger than the number of lags. Below is my sample dataframe with 13 historic values

Python Version: 3.8 skforecast version: 0.4.3

historic_data [2022-01-01           77.0] [2022-01-02           77.0] [2022-01-03           77.0] [2022-01-04           77.0] [2022-01-05           77.0] [2022-01-06           77.0] [2022-01-07           77.0] [2022-01-08           77.0] [2022-01-09           77.0] [2022-01-10           77.0] [2022-01-11           77.0] [2022-01-12           77.0] [2022-01-13           77.0]

Forecaster Object after fitting

Regressor: XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.300000012,
             max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=16,
             num_parallel_tree=1, predictor='auto', random_state=123,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=0) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12] 
Window size: 12 
Included exogenous: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: [Timestamp('2022-01-01 00:00:00'), Timestamp('2022-01-13 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: D 
Regressor parameters: {'objective': 'reg:squarederror', 'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 1, 'enable_categorical': False, 'gamma': 0, 'gpu_id': -1, 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.300000012, 'max_delta_step': 0, 'max_depth': 6, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 16, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 123, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'subsample': 1, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': 0} 
Creation date: 2022-06-10 11:16:13 
Last fit date: 2022-06-10 11:16:15 
Skforecast version: 0.4.3 

Code used for fitting and prediction

forecaster = ForecasterAutoreg(
            regressor=XGBRegressor(random_state=123, verbosity=0),
        )[:, 'historic_data'])
predicted = forecaster.predict(steps=6)


self = ================= 
Regressor: XGBRegressor(base_score=0.5, booster='gbtree', col...1, 'verbosity': 0} 
Creation date: 2022-06-10 11:17:54 
Last fit date: 2022-06-10 11:17:54 
Skforecast version: 0.4.3 

steps = 6, last_window = array([77.]), exog = None

    def _recursive_predict(
        steps: int,
        last_window: np.array,
        exog: np.array
    ) -> pd.Series:
        Predict n steps ahead. It is an iterative process in which, each prediction,
        is used as a predictor for the next step.

        steps : int
            Number of future steps predicted.

        last_window : numpy ndarray
            Values of the series used to create the predictors (lags) need in the
            first iteration of prediction (t + 1).

        exog : numpy ndarray, pandas DataFrame
            Exogenous variable/s included as predictor/s.

        predictions : numpy ndarray
            Predicted values.


        predictions = np.full(shape=steps, fill_value=np.nan)

        for i in range(steps):
>           X = last_window[-self.lags].reshape(1, -1)
E           IndexError: index -2 is out of bounds for axis 0 with size 1
hdattada commented 2 years ago

I think I know whats happening, it would be great to get a confirmation. The training window is set by length_of_dataset - num_of_lags so in my case my dataset size was 13 and my lag was 12. So only 1 value was being added to the last window. Is that understanding right?

JavierEscobarOrtiz commented 2 years ago

Hello @hdattada,

Yes, that is a bug we found in version 0.4.3. You can read a full description in this issue.

We fixed it in version 0.5.0. We are still developing this version but you can install it from GitHub using in the shell:

pip install git+ 

Please, let us know if this fixes your problem.

Thank you very much!

JoaquinAmatRodrigo commented 1 year ago

Fixed it in version 0.5.0.