bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
112 stars 18 forks source link

fastshap handling nominal variables #36

Closed Npaffen closed 2 years ago

Npaffen commented 2 years ago

How does fastshap handle nominal variables?

`library(tidymodels) library(tidyverse) library(mlbench) library(xgboost) library(lightgbm) library(treesnip) data(Glass)

head(Glass) Glass$Type rec <-recipe(RI ~., data = Glass) %>% step_scale(all_numeric())

prep_rec <- prep(rec, retain = TRUE)

split <- initial_split(Glass)

train_data <- training(split)

test_data <- testing(split)

model<- parsnip::boost_tree( mode = "regression" ) %>% set_engine('lightgbm' , verbose = 0 )

wf_glass <- workflow() %>% add_recipe(rec) %>% add_model(model) fit <- wf_glass %>% parsnip::fit(data = train_data)

library(fastshap) explain(object = fit %>% extract_fit_parsnip(), newdata = test_data %>% select(-RI) %>% as.matrix(), X = train_data %>% select(-RI) %>% as.matrix(), pred_wrapper = predict)`

This will lead to the following error : Error in genFrankensteinMatrices(X, W, O, feature = column) : Not compatible with requested type: [type=character; target=double].

My guess : This might be due to the fact that transforming a data.frame with different vector types with as.matrix() will lead to a character matrix. This matrix can't be transformed to a matrix of type double without loosing the values of the factor columns here Type. On the other hand, as the error expresses, we can't use a numeric target for the regression task if all other variables are of class character.

Am I missing something or is this a possible transforming problem? Is there an option to specify nominal/factor variables?

bgreenwell commented 2 years ago

Hi @Npaffen , yes, I believe you are correct. If you want to use a matrix, everything needs to be encoded numerically (e.g., like in an XGBoost model). If you have factors, you'll need to use a data frame.