bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
113 stars 18 forks source link

Calculation of bias with weighting #21

Closed jakubkovac closed 2 years ago

jakubkovac commented 3 years ago

First of all thanks for the really nice library 👍

(I can add code demonstrating the problems I found here, but I wanted to discuss the possible solutions first)

For specific purposes I wanted to use fastshap to calculate approximate shap values of an xgboost object instead of using either fastshap with exact=TRUE for this or just using the internal xgboost with predcontrib=T.

The reason one might want to do this, is if you use a specific loss function in your model, the exact shaps might not sum up to target, but you have to use a link function to transform them to a target. Not helpful if you need the additive property.

There's one caveat here. If you use a weight variable as an input to the xgboost::xgb.DMatrix the Bias term calculated from the method won't be just an average of training predictions, but it will be a weighted average. I noticed that the adjust argument calculates fnull <- mean(pred_wrapper(object, newdata = X)). here

I solved this by forking your explain and explain.default while adding a new argument to them fnull that defaults to NULL. If this is null it will calculate it as mean of training predictions, or you can just supply a number(calculate the weighted mean of training predictions outside).

What would you say would be the best way to recreate shap values from xgboost? Same as the approach that I've mentioned here?

I'm very open to creating a pull request with such a fix.

TylerGrantSmith commented 3 years ago

This is timely. I need this functionality and I am probably going to fork it myself to introduce it also

bgreenwell commented 2 years ago

Sorry for the (very super late) reply, but this is now handled in the devel branch and will make its way to CRAN in the next month or so!