bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
113 stars 18 forks source link

explain() seems to fail if X is a tibble (two different errors) #20

Closed ltd-pa closed 1 year ago

ltd-pa commented 3 years ago

Apologies in advance if this is related to the package configuration on my machine - but explain() seems to fail for me when 'X' is a tibble. Here is an example that is reproducible on my machine:

# Load required packages
library(fastshap)  # for fast (approximate) Shapley values
library(ranger)    # for fast random forest algorithm
library(dplyr)
library(tibble)

# Simulate training data
trn <- gen_friedman(200, seed = 101)
X <- subset(trn, select = -y)  # feature columns only

# Fit a random forest
set.seed(102)
rfo <- ranger(y ~ ., data =  trn)

# Prediction wrapper
pfun <- function(object, newdata) {
  predict(object, data = newdata)$predictions
}

# Succeeds when X is a dataframe
shap <- explain(rfo, X = X, pred_wrapper = pfun, nsim = 1)

# Fails when X is a tibble (with error "Subscript `O` is a matrix, the data `X[O]` must have size 1.")
shap <- explain(rfo, X = X %>% tibble(), pred_wrapper = pfun, nsim = 1)

# Still succeeds if we add a factor predictor
trn$fac_pred <- sample(c("A", "B", "C"), 200, TRUE) %>% factor()
X <- subset(trn, select = -y)  # feature columns only
set.seed(102)
rfo <- ranger(y ~ ., data =  trn)
shap <- explain(rfo, X = X, pred_wrapper = pfun, nsim = 1)

# But fails in a new way with tibbles (with error "Error: Can't combine `x1` <double> and `fac_pred` <factor<5eb8e>>.")
shap <- explain(rfo, X = X %>% tibble(), pred_wrapper = pfun, nsim = 1)
bgreenwell commented 1 year ago

Thanks @ltd-pa , should be fixed in 0.1.0.