dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.17k stars 8.71k forks source link

[Python] feature contributions from tweedie regression model. #9821

Open Lejboelle opened 10 months ago

Lejboelle commented 10 months ago

I've trained a model using tweedie regression and would like to make predictions that output the different feature contributions. However, it seems like the contributions are not transformed:

import xgboost as xgb
import numpy as np

mdl = xgb.XGBRegressor(objective='reg:tweedie', tweedie_variance_power=1.0, random_state=0)
train_x = np.random.randint(0, 20, (5, 3))
train_y = np.random.random((5, 1))
mdl.fit(train_x, train_y)

test_data = np.random.randint(0, 20, (1, 3))
mdl.predict(test_data)  # outputs: 0.2877181

dmatrix_test = xgb.DMatrix(test_data)
mdl.get_booster().predict(dmatrix_test, pred_contribs=True)  # outputs:  [ 0.  ,  0.  , -0.3573199, -0.8884542]

If I run np.exp(np.sum(mdl.get_booster().predict(dmatrix_test, pred_contribs=True))) i get the same result as mdl.predict(), but I would like the individually transformed contributions.

Not sure if it is a bug or it is just not possible?

Thanks.

Environment: Python 3.9 Xgboost 1.7.6

trivialfis commented 10 months ago

but I would like the individually transformed contributions.

Could you please elaborate on this? What do you mean by individually transformed?

Lejboelle commented 10 months ago

but I would like the individually transformed contributions.

Could you please elaborate on this? What do you mean by individually transformed?

So similarly to using other objective functions like psuedohubererror, I would like the sum of the contributions to be equal to 0.2877181 to see how each feature contributed to the predicted value. Edit: Some years ago there was ongoing work on implementing this in SHAP: https://github.com/shap/shap/pull/1041, however, it seems like it never got merged.

mayer79 commented 10 months ago

TreeSHAP in XGBoost is calculated on "raw" scale, which is the link scale. Like Poisson and Gamma, Tweedie objective uses the log link, so the SHAP values will sum up to the log prediction minus the baseline.