CDonnerer / xgboost-distribution

Probabilistic prediction with XGBoost.
MIT License
99 stars 17 forks source link

Is there a way to get feature importances like in NGBoost? #83

Closed manningkyle304 closed 1 year ago

manningkyle304 commented 1 year ago

In NGBoost, a nice thing is the ability to get feature importances for distribution parameters - see the example below from the NGBoost docs, https://stanfordmlgroup.github.io/ngboost/3-interpretation.html

ngb = NGBRegressor(verbose=False).fit(X_reg_train, Y_reg_train)

## Feature importance for loc trees
feature_importance_loc = ngb.feature_importances_[0]

## Feature importance for scale trees
feature_importance_scale = ngb.feature_importances_[1]

However, it doesn't seem this is available in xgboost-distribution or at least not by this method.

Is there a way to get feature importances? or any plan to add it?

ChristianMichelsen commented 1 year ago

You can compute them yourself by something similar to this:

def get_shap_values(model, X):
    if isinstance(model, XGBDistribution):
        model = model.get_booster()
    explainer = shap.TreeExplainer(model)
    shap_values = explainer(X)
    return shap_values
CDonnerer commented 1 year ago

The xgboost native way is to access the feature_importances_ attribute post fitting, see here in the xgboost docs. Note that this will average over the params of a given distribution.

manningkyle304 commented 1 year ago

@CDonnerer is there a way to get the specific feature importances for the separate parameters though?

CDonnerer commented 1 year ago

It does not appear to be natively supported, but you could potentially try using SHAP values (see also this issue)