bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
112 stars 18 forks source link

Fastshap does not work with randomForest package #27

Closed jttoivon closed 2 years ago

jttoivon commented 2 years ago

I tried to modify the simple example from fastshap webpage so that it would use randomForest instead of ranger. However, the returned shap values are all zero. What could be wrong? The slightly modified example program is below.

Jarkko

# Load required packages
library(fastshap)  # for fast (approximate) Shapley values
library(randomForest)

# Simulate training data
trn <- gen_friedman(3000, seed = 101)
X <- subset(trn, select = -y)  # feature columns only

# Fit a random forest
set.seed(102)

rfo <- randomForest::randomForest(y ~ ., data =  trn)

# Prediction wrapper
pfun <- function(object, newdata) {
  unname(predict(object, data = newdata))
}

# Compute fast (approximate) Shapley values using 10 Monte Carlo repetitions
system.time({  # estimate run time
  set.seed(5038)
  shap <- fastshap::explain(rfo, X = X, pred_wrapper = pfun, nsim = 100)
})

# Results are returned as a tibble (with the additional "shap" class)
shap
bgreenwell commented 2 years ago

Hi @jttoivon, your prediction wrapper needs to be updated. For randomForest(), you need to supply newdata, as opposed to just data (the latter is used by ranger's predict() method. Make that change and it works just fine, but explaining 3000 instances using 100 Monte Carlo reps will still be expensive to compute. Also, since you loaded the packages at the top, you don'd need to use :: to access any of those packages functions.

jttoivon commented 2 years ago

It works! Thanks for the quick reply and great package!

PS. I use verbose style as a documentation, where I always include every package with 'library', even though I sometimes specify the namespace explicitly, for example when the same function exists in several packages, or for a reminder for myself which package defines a rarely used function.

Jarkko