JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
992 stars 113 forks source link

SHAP values when using global models with series with different lengths #737

Closed unaguilly closed 2 days ago

unaguilly commented 3 days ago

Hello, I was following this code example but couldn't make SHAP work.

I fit the model like in the example:

regressor = LGBMRegressor(random_state=42, verbose=-1, **best_trial.params)

forecaster = ForecasterAutoregMultiSeries(
                    regressor          = regressor,
                    lags               = 7,
                    dropna_from_series = False,
                    transformer_exog = transformer_exog,
                    fit_kwargs       = {'categorical_feature': categorical_features}
                     )
forecaster.fit(series=series_dict_train, exog=exog_dict_train, suppress_warnings=True)

Then I started with SHAP:

explainer = shap.TreeExplainer(forecaster.regressor)
shap_values = explainer.shap_values(series_dict_train)

As expected this won't work because series_dict_train is a dictionary (and is missing the exog variables) so I transformed the dictionary to a dataframe and concatenated the exog variables. And here the error is: ValueError: train and valid dataset categorical_feature do not match. I tried removing all categorical features (and the transformer_exog from the forecaster) but the error persist. Am I missing something? thanks!

JavierEscobarOrtiz commented 3 days ago

Hello @unaguilly

You need to extract the training matrices because that is the value that the explainer needs to receive:

# Training matrices used by the forecaster to fit the internal regressor
# ==============================================================================
X_train, y_train = forecaster.create_train_X_y(series=series_dict_train, exog=exog_dict_train)

# Create SHAP explainer
# ==============================================================================
explainer = shap.TreeExplainer(forecaster.regressor)
shap_values = explainer.shap_values(X_train)

Please, let me know if this works for you

unaguilly commented 2 days ago

Thanks a lot, this worked. And thanks for the swift response