Closed JorgeGomes72 closed 1 year ago
Hi @JorgeGomes72 , thanks for using skforecast. Your use case is what we call an intermittent demand with a regular pattern. This is a field we are actively exploring, and we will soon publish a user's guide that will discuss it.
Why does your model predict non-zero values? I see that the predicted values for Sundays are much lower than predictions for other days, yet the model is not able to learn that it should be exactly zero. Since you mention that there are no Sundays in the historical data with sales, the problem is not in the training data. I think the model has learned that sales on one day are highly correlated with sales on previous days. This is true for all days of the week except Sundays. So the model is torn between learning a general pattern and a local one.
How to solve it.
There are some options. If what is really important for your business case is the predictions for all days except Sunday, try to find the model that best predicts the days Monday through Saturday (you can use a custom metric to ignore specific values). Once you have the predictions, replace all Sunday values with zero.
If you really need the output of the model to be always positive, you can apply a log transformation to the data before fitting the model.
I hope this helps!
Hello, i am doing that, replace final sunday predictions with 0.
Do you think this is s model problem? not a skforecast problem? i mean, peraphs my result was diferent if i did't use skforecast? what do you think?
Thank you very much, JG
Hi @JorgeGomes72, I would say that it is related to the learning process of the regressor. However, I encourage you to compare the results with other models or libraries. You may find better performance.
Let us know if you find interesting results!
Hello, i need help to "reorganize" my model after search for hyperparameters.
My inicial model:
forecaster = ForecasterAutoreg( regressor = XGBRegressor(random_state=123), lags = 24, weight_func = custom_weights )
My grid search:
param_grid = { 'n_estimators': [100, 500], 'max_depth': [3, 5, 10], 'learning_rate': [0.01, 0.1] }
lags_grid = [24, 30, 48, 72, 168, [1, 2, 3, 7, 23, 24, 25, 71, 72, 73, 168]]
results_grid = grid_search_forecaster(
forecaster = forecaster,
y = vendas_df2.loc[:end_validation, 'SALES'],
exog = vendas_df2.loc[:end_validation, exog_variables],
param_grid = param_grid,
lags_grid = lags_grid,
steps = 2200,
refit = False,
metric = 'mean_squared_error', #custom_metric, #
initial_train_size = int(len(data_train)),
fixed_train_size = False,
return_best = True,
verbose = False
)
After search for best parameters, this is output:
================= ForecasterAutoreg ================= Regressor: XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=8, num_parallel_tree=1, predictor='auto', random_state=123, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', validate_parameters=1, verbosity=None) Lags: [ 1 2 3 7 23 24 25 71 72 73 168] Transformer for y: None Transformer for exog: None Window size: 168 Weight function included: True Exogenous included: True Type of exogenous variable: <class 'pandas.core.frame.DataFrame'> Exogenous variables names: ['OPEN', 'FERIADO', 'YEAR_2019', 'YEAR_2020', 'YEAR_2021', 'YEAR_2022', 'YEAR_2023', 'WEEK_1', 'WEEK_2', 'WEEK_3', 'WEEK_4', 'WEEK_5', 'WEEK_6', 'WEEK_7', 'WEEK_8', 'WEEK_9', 'WEEK_10', 'WEEK_11', 'WEEK_12', 'WEEK_13', 'WEEK_14', 'WEEK_15', 'WEEK_16', 'WEEK_17', 'WEEK_18', 'WEEK_19', 'WEEK_20', 'WEEK_21', 'WEEK_22', 'WEEK_23', 'WEEK_24', 'WEEK_25', 'WEEK_26', 'WEEK_27', 'WEEK_28', 'WEEK_29', 'WEEK_30', 'WEEK_31', 'WEEK_32', 'WEEK_33', 'WEEK_34', 'WEEK_35', 'WEEK_36', 'WEEK_37', 'WEEK_38', 'WEEK_39', 'WEEK_40', 'WEEK_41', 'WEEK_42', 'WEEK_43', 'WEEK_44', 'WEEK_45', 'WEEK_46', 'WEEK_47', 'WEEK_48', 'WEEK_49', 'WEEK_50', 'WEEK_51', 'WEEK_52', 'WEEK_53', 'WEEKDAY_1', 'WEEKDAY_2', 'WEEKDAY_3', 'WEEKDAY_4', 'WEEKDAY_5', 'WEEKDAY_6', 'WEEKDAY_7', 'HOUR_0', 'HOUR_1', 'HOUR_2', 'HOUR_3', 'HOUR_4', 'HOUR_5', 'HOUR_6', 'HOUR_7', 'HOUR_8', 'HOUR_9', 'HOUR_10', 'HOUR_11', 'HOUR_12', 'HOUR_13', 'HOUR_14', 'HOUR_15', 'HOUR_16', 'HOUR_17', 'HOUR_18', 'HOUR_19', 'HOUR_20', 'HOUR_21', 'HOUR_22', 'HOUR_23'] Training range: [Timestamp('2019-01-01 00:00:00'), Timestamp('2022-09-30 23:00:00')] Training index type: DatetimeIndex Training index frequency: H Regressor parameters: {'objective': 'reg:squarederror', 'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 1, 'enable_categorical': False, 'gamma': 0, 'gpu_id': -1, 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 3, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 8, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 123, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'subsample': 1, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': None} Creation date: 2023-04-13 21:51:22 Last fit date: 2023-04-14 01:28:46 Skforecast version: 0.7.0 Python version: 3.8.8 Forecaster id: None
So, i need to put all this "best parameters" on the model, i must test a lot of identical "stores" and don't need to test again!
i tried this:
forecaster2 = ForecasterAutoreg (
regressor = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1,
monotone_constraints='()', n_estimators=100, n_jobs=8,
num_parallel_tree=1, predictor='auto', random_state=123,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', validate_parameters=1, verbosity=None,
window_size = 168, included_exog = True,
exog_col_names = exog_variables),
lags=[1, 2, 3, 7, 23, 24, 25, 71, 72, 73, 168],
weight_func = custom_weights
)
but the result seems different:
================= ForecasterAutoreg ================= Regressor: XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, exog_col_names=['OPEN', 'FERIADO', 'YEAR_2019', 'YEAR_2020', 'YEAR_2021', 'YEAR_2022', 'YEAR_2023', 'WEEK_1', 'WEEK_2', 'WEEK_3', 'WEEK_4', 'WEEK_5', 'WEEK_6', 'WEEK_7', 'WEEK_8', 'WEEK_9', 'WEEK_10', 'WEEK_11', 'WEEK_12', 'WEEK_13', 'WEEK_14... gamma=0, gpu_id=-1, importance_type=None, included_exog=True, interaction_constraints='', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=8, num_parallel_tree=1, predictor='auto', random_state=123, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', validate_parameters=1, verbosity=None, ...) Lags: [ 1 2 3 7 23 24 25 71 72 73 168] Transformer for y: None Transformer for exog: None Window size: 168 Weight function included: True Exogenous included: False Type of exogenous variable: None Exogenous variables names: None Training range: None Training index type: None Training index frequency: None Regressor parameters: {'objective': 'reg:squarederror', 'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 1, 'enable_categorical': False, 'gamma': 0, 'gpu_id': -1, 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 3, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 8, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 123, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'subsample': 1, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': None, 'window_size': 168, 'included_exog': True, 'exog_col_names': ['OPEN', 'FERIADO', 'YEAR_2019', 'YEAR_2020', 'YEAR_2021', 'YEAR_2022', 'YEAR_2023', 'WEEK_1', 'WEEK_2', 'WEEK_3', 'WEEK_4', 'WEEK_5', 'WEEK_6', 'WEEK_7', 'WEEK_8', 'WEEK_9', 'WEEK_10', 'WEEK_11', 'WEEK_12', 'WEEK_13', 'WEEK_14', 'WEEK_15', 'WEEK_16', 'WEEK_17', 'WEEK_18', 'WEEK_19', 'WEEK_20', 'WEEK_21', 'WEEK_22', 'WEEK_23', 'WEEK_24', 'WEEK_25', 'WEEK_26', 'WEEK_27', 'WEEK_28', 'WEEK_29', 'WEEK_30', 'WEEK_31', 'WEEK_32', 'WEEK_33', 'WEEK_34', 'WEEK_35', 'WEEK_36', 'WEEK_37', 'WEEK_38', 'WEEK_39', 'WEEK_40', 'WEEK_41', 'WEEK_42', 'WEEK_43', 'WEEK_44', 'WEEK_45', 'WEEK_46', 'WEEK_47', 'WEEK_48', 'WEEK_49', 'WEEK_50', 'WEEK_51', 'WEEK_52', 'WEEK_53', 'WEEKDAY_1', 'WEEKDAY_2', 'WEEKDAY_3', 'WEEKDAY_4', 'WEEKDAY_5', 'WEEKDAY_6', 'WEEKDAY_7', 'HOUR_0', 'HOUR_1', 'HOUR_2', 'HOUR_3', 'HOUR_4', 'HOUR_5', 'HOUR_6', 'HOUR_7', 'HOUR_8', 'HOUR_9', 'HOUR_10', 'HOUR_11', 'HOUR_12', 'HOUR_13', 'HOUR_14', 'HOUR_15', 'HOUR_16', 'HOUR_17', 'HOUR_18', 'HOUR_19', 'HOUR_20', 'HOUR_21', 'HOUR_22', 'HOUR_23']} Creation date: 2023-04-14 10:30:48 Last fit date: None Skforecast version: 0.7.0 Python version: 3.8.8 Forecaster id: None
Could you help please? Thank you!
Hi @JorgeGomes72,
What do you mean by "the result seems different"?
Please, try to add the python code inside
. It is rendered much more readable way.
Hello Joaquin, i mean the result of forecaster = ForecasterAutoreg(...)
After search best parameters we can see in forecaster:
" Regressor: XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=8, num_parallel_tree=1, predictor='auto', random_state=123, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', validate_parameters=1, verbosity=None) Lags: [ 1 2 3 7 23 24 25 71 72 73 168] Transformer for y: None Transformer for exog: None Window size: 168 Weight function included: True Exogenous included: True Type of exogenous variable: <class 'pandas.core.frame.DataFrame'> Exogenous variables names: ['OPEN', 'FERIADO', 'YEAR_2019', 'YEAR_2020', 'YEAR_2021', 'YEAR_2022', 'YEAR_2023', 'WEEK_1', 'WEEK_2', 'WEEK_3', 'WEEK_4', 'WEEK_5', 'WEEK_6', 'WEEK_7', 'WEEK_8', 'WEEK_9', 'WEEK_10', 'WEEK_11', 'WEEK_12', 'WEEK_13', 'WEEK_14', 'WEEK_15', 'WEEK_16', 'WEEK_17', 'WEEK_18', 'WEEK_19', 'WEEK_20', 'WEEK_21', 'WEEK_22', 'WEEK_23', 'WEEK_24', 'WEEK_25', 'WEEK_26', 'WEEK_27', 'WEEK_28', 'WEEK_29', 'WEEK_30', 'WEEK_31', 'WEEK_32', 'WEEK_33', 'WEEK_34', 'WEEK_35', 'WEEK_36', 'WEEK_37', 'WEEK_38', 'WEEK_39', 'WEEK_40', 'WEEK_41', 'WEEK_42', 'WEEK_43', 'WEEK_44', 'WEEK_45', 'WEEK_46', 'WEEK_47', 'WEEK_48', 'WEEK_49', 'WEEK_50', 'WEEK_51', 'WEEK_52', 'WEEK_53', 'WEEKDAY_1', 'WEEKDAY_2', 'WEEKDAY_3', 'WEEKDAY_4', 'WEEKDAY_5', 'WEEKDAY_6', 'WEEKDAY_7', 'HOUR_0', 'HOUR_1', 'HOUR_2', 'HOUR_3', 'HOUR_4', 'HOUR_5', 'HOUR_6', 'HOUR_7', 'HOUR_8', 'HOUR_9', 'HOUR_10', 'HOUR_11', 'HOUR_12', 'HOUR_13', 'HOUR_14', 'HOUR_15', 'HOUR_16', 'HOUR_17', 'HOUR_18', 'HOUR_19', 'HOUR_20', 'HOUR_21', 'HOUR_22', 'HOUR_23'] Training range: [Timestamp('2019-01-01 00:00:00'), Timestamp('2022-09-30 23:00:00')] Training index type: DatetimeIndex Training index frequency: H Regressor parameters: {'objective': 'reg:squarederror', 'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 1, 'enable_categorical': False, 'gamma': 0, 'gpu_id': -1, 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 3, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 8, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 123, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'subsample': 1, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': None} "
but when i create a new model forecaster2 with best parameters, the result is:
" Regressor: XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, exog_col_names=['OPEN', 'FERIADO', 'YEAR_2019', 'YEAR_2020', 'YEAR_2021', 'YEAR_2022', 'YEAR_2023', 'WEEK_1', 'WEEK_2', 'WEEK_3', 'WEEK_4', 'WEEK_5', 'WEEK_6', 'WEEK_7', 'WEEK_8', 'WEEK_9', 'WEEK_10', 'WEEK_11', 'WEEK_12', 'WEEK_13', 'WEEK_14... gamma=0, gpu_id=-1, importance_type=None, included_exog=True, interaction_constraints='', learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=8, num_parallel_tree=1, predictor='auto', random_state=123, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', validate_parameters=1, verbosity=None, ...) Lags: [ 1 2 3 7 23 24 25 71 72 73 168] Transformer for y: None Transformer for exog: None Window size: 168 Weight function included: True Exogenous included: False Type of exogenous variable: None Exogenous variables names: None Training range: None Training index type: None Training index frequency: None Regressor parameters: {'objective': 'reg:squarederror', 'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 1, 'enable_categorical': False, 'gamma': 0, 'gpu_id': -1, 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 3, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 8, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 123, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'subsample': 1, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': None, 'window_size': 168, 'included_exog': True, 'exog_col_names': ['OPEN', 'FERIADO', 'YEAR_2019', 'YEAR_2020', 'YEAR_2021', 'YEAR_2022', 'YEAR_2023', 'WEEK_1', 'WEEK_2', 'WEEK_3', 'WEEK_4', 'WEEK_5', 'WEEK_6', 'WEEK_7', 'WEEK_8', 'WEEK_9', 'WEEK_10', 'WEEK_11', 'WEEK_12', 'WEEK_13', 'WEEK_14', 'WEEK_15', 'WEEK_16', 'WEEK_17', 'WEEK_18', 'WEEK_19', 'WEEK_20', 'WEEK_21', 'WEEK_22', 'WEEK_23', 'WEEK_24', 'WEEK_25', 'WEEK_26', 'WEEK_27', 'WEEK_28', 'WEEK_29', 'WEEK_30', 'WEEK_31', 'WEEK_32', 'WEEK_33', 'WEEK_34', 'WEEK_35', 'WEEK_36', 'WEEK_37', 'WEEK_38', 'WEEK_39', 'WEEK_40', 'WEEK_41', 'WEEK_42', 'WEEK_43', 'WEEK_44', 'WEEK_45', 'WEEK_46', 'WEEK_47', 'WEEK_48', 'WEEK_49', 'WEEK_50', 'WEEK_51', 'WEEK_52', 'WEEK_53', 'WEEKDAY_1', 'WEEKDAY_2', 'WEEKDAY_3', 'WEEKDAY_4', 'WEEKDAY_5', 'WEEKDAY_6', 'WEEKDAY_7', 'HOUR_0', 'HOUR_1', 'HOUR_2', 'HOUR_3', 'HOUR_4', 'HOUR_5', 'HOUR_6', 'HOUR_7', 'HOUR_8', 'HOUR_9', 'HOUR_10', 'HOUR_11', 'HOUR_12', 'HOUR_13', 'HOUR_14', 'HOUR_15', 'HOUR_16', 'HOUR_17', 'HOUR_18', 'HOUR_19', 'HOUR_20', 'HOUR_21', 'HOUR_22', 'HOUR_23']} "
it seems different, for exemple i see exog_columns in the Regressor parameters, not outside regressor parameters.
What i want to know is: i must create a model with best parameters like this:
forecaster2 = ForecasterAutoreg (
regressor = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.01, max_delta_step=0,
max_depth=3, min_child_weight=1,
monotone_constraints='()', n_estimators=100, n_jobs=8,
num_parallel_tree=1, predictor='auto', random_state=123,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', validate_parameters=1, verbosity=None,
window_size = 168, included_exog = True,
exog_col_names = exog_variables),
lags=[1, 2, 3, 7, 23, 24, 25, 71, 72, 73, 168],
weight_func = custom_weights
)
or simply just like this, without exog_col_names inside regressor:
forecaster2 = ForecasterAutoreg(
regressor = XGBRegressor(random_state=123, learning_rate = 0.01, max_depth = 3, n_estimators = 100),
lags = [1, 2, 3, 7, 23, 24, 25, 71, 72, 73, 168],
weight_func = custom_weights
)
metric, predictions = backtesting_forecaster( forecaster = forecaster2, y = vendas_df2['SALES'], exog = vendas_df2[exog_variables], initial_train_size = len(vendas_df2.loc[:end_validation]), fixed_train_size = False, steps = 2200, refit = False,
metric = 'mean_squared_error', #custom_metric,
verbose = False
)
imagine that you have "return_best = False" on grid_search_forecaster and you must explicitly write best model.
Thank you! JG
Once you have created the new forecaster instance, you need to train it using the .fit
method.
The results you are showing are from a not fitted forecaster:
Training range: None
Training index type: None
Training index frequency: None
Hello Joaquin,
something like this?
forecaster = ForecasterAutoreg (
regressor = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1,
monotone_constraints='()', n_estimators=100, n_jobs=8,
num_parallel_tree=1, predictor='auto', random_state=123,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', validate_parameters=1, verbosity=None),
lags = 168,
weight_func = custom_weights
)
forecaster.fit(y=vendas_df2.loc[:end_validation, 'SALES']) predictions = forecaster.predict(steps=2200)
or like this one?
forecaster = ForecasterAutoreg (
regressor = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1,
monotone_constraints='()', n_estimators=100, n_jobs=8,
num_parallel_tree=1, predictor='auto', random_state=123,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', validate_parameters=1, verbosity=None),
lags = 168,
weight_func = custom_weights
)
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = vendas_df2['SALES'],
exog = vendas_df2[exog_variables],
initial_train_size = len(vendas_df2.loc[:end_validation]),
fixed_train_size = False,
steps = 2200,
refit = False,
metric = 'mean_squared_error',
verbose = False
)
Thank you! JG
To have the same results as in the backtesting, you have to train the forecaster with the same data:
forecaster.fit(y= vendas_df2['SALES'], exog = vendas_df2[exog_variables])
Further more, if you ar including exogenous variables in fit
, you should also provide them in the predict
.
This is a part of script:
end_train = '2021-08-20 23:59:00' end_validation = '2022-09-30 23:59:00'
data_train = vendas_df2.loc[: end_train, :] data_val = vendas_df2.loc[end_train:end_validation, :] data_test = vendas_df2.loc[end_validation:, :]
print(f"Dates train : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})") print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()} (n={len(data_val)})") print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
def custom_weights(index):
weights = np.where((((index >= '2020-03-10 00:01:00') & (index <= '2020-05-31 23:59:00')) | ((index >= '2021-01-15 00:01:00') & (index <= '2021-04-18 23:59:00'))), 0, 1)
return weights
forecaster = ForecasterAutoreg( regressor = XGBRegressor(random_state=123), lags = 168, weight_func = custom_weights )
param_grid = { 'n_estimators': [100, 500], 'max_depth': [3, 5, 10], 'learning_rate': [0.01, 0.1] }
lags_grid = [24, 30, 48, 72, 168, [1, 2, 3, 7, 23, 24, 25, 71, 72, 73, 168]]
results_grid = grid_search_forecaster(
forecaster = forecaster,
y = vendas_df2.loc[:end_validation, 'SALES'],
exog = vendas_df2.loc[:end_validation, exog_variables],
param_grid = param_grid,
lags_grid = lags_grid,
steps = 2200,
refit = False,
metric = 'mean_squared_error', #custom_metric, #
initial_train_size = int(len(data_train)),
fixed_train_size = False,
return_best = True,
verbose = False
)
metric, predictions = backtesting_forecaster( forecaster = forecaster, y = vendas_df2['SALES'], exog = vendas_df2[exog_variables], initial_train_size = len(vendas_df2.loc[:end_validation]), fixed_train_size = False, steps = 2200, refit = False,
metric = 'mean_squared_error', #custom_metric,
verbose = False
)
predictions.loc['2023-05-02':'2023-05-02']
"Further more, if you ar including exogenous variables in fit, you should also provide them in the predict." - my final dataset has the same features as train dataset.
i must sleep! thank you! Jg
Hello, I want to thank you for all the help! I just want to leave here the final script with best parameters, fit and predict:
#model function:
def modelo_XGBRegressor(vendas_df2, tipo):
#exogen variables
exog_variables = [column for column in vendas_df2.columns
if column.startswith(('YEAR','WEEK','WEEKDAY','HOUR','OPEN','FERIADO'))]
#model
forecaster = ForecasterAutoreg (
regressor = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1,
monotone_constraints='()', n_estimators=100, n_jobs=8,
num_parallel_tree=1, predictor='auto', random_state=123,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', validate_parameters=1, verbosity=None),
lags=[1, 2, 3, 7, 23, 24, 25, 71, 72, 73, 168],
weight_func = custom_weights
)
today_minus_15_a = pd.to_datetime((dt.today() - timedelta(days=7)).strftime("%Y-%m-%d %H:%M:%S")).replace(minute=0, second=0)
today_minus_15_p = pd.to_datetime(today_minus_15_a) + timedelta(hours=1)
#fit
forecaster.fit(y = vendas_df2.loc[:today_minus_15_a, tipo], exog = vendas_df2.loc[:today_minus_15_a, exog_variables])
#predictions
predictions = forecaster.predict(steps=2200, exog = vendas_df2.loc[today_minus_15_p:, exog_variables])
return predictions
##apply model
predict_sales = modelo_XGBRegressor(vendas_df2, tipo = 'SALES')
.......
Thank you very much, next i will try SARIMA. JG
Hello, I'm building a time series model, for forecasting Sales, based on the history since 2019. My model uses SKForecast with XGBRegressor.
I want to forecast 75 days. My target is SALES.
I use external features to help model, transformed in 0 and 1.
I would like to understand why my final forecast have non-zero values on target variable SALES on Sundays, even when i use a external feature OPEN=0 and even when i have SALES=0 on every Sundays in the history.
My dataset have this struture: DATA| SALES| YEAR | WEEK | WEEKDAY | OPEN
ex:![image](https://user-images.githubusercontent.com/57633568/230049756-a332a0b6-8999-43d9-a67d-b1c8e7c25581.png)
The variable "WEEKDAY"= 7 means Sunday. On dataset i have every Sundays with SALES=0, with OPEN=0.
The external feature "OPEN"=0 means that store is closed, OPEN=1 means store open.
This is my final dataset (vendas_df2) before execute model :
The exog_variables uses all the external features, except target variable (SALES)
This is the train, validation and test :![image](https://user-images.githubusercontent.com/57633568/230051747-3d0513f6-413e-423a-90d1-79197bff3944.png)
This is the parameters for model:
Create forecaster
======================================
forecaster = ForecasterAutoreg( regressor = XGBRegressor(random_state=123), lags = 7 #24 )
Grid search of hyperparameters and lags
========================================
Regressor hyperparameters
param_grid = { 'n_estimators': [100, 500], 'max_depth': [3, 5, 10], 'learning_rate': [0.01, 0.1] }
Lags used as predictors
lags_grid = [7, 30, 48, 72, [1, 2, 3, 7, 23, 24, 25, 71, 72, 73]]
results_grid = grid_search_forecaster( forecaster = forecaster, y = vendas_df2.loc[:end_validation, 'SALES'], exog = vendas_df2.loc[:end_validation, exog_variables],
param_grid = param_grid, lags_grid = lags_grid, steps = 75, refit = False, metric = 'mean_squared_error', initial_train_size = int(len(data_train)), fixed_train_size = False, return_best = True, verbose = False )
Backtesting test data
=========================================
metric, predictions = backtesting_forecaster( forecaster = forecaster, y = vendas_df2['SALES'], exog = vendas_df2[exog_variables], initial_train_size = len(vendas_df2.loc[:end_validation]), fixed_train_size = False, steps = 75, refit = False, metric = 'mean_squared_error', verbose = False )
print(f"Backtest error: {metric}")
This is the final result with forecast for May:
We can see that Sundays have SALES <>0.
My dataset has OPEN=0 for Sundays, so why i can't forecast zero values for Sundays (prev=0) ?
Can you help please? Thank you!
Jorge Gomes
Originally posted by @JorgeGomes72 in https://github.com/JoaquinAmatRodrigo/skforecast/discussions/388