ModelOriented / treeshap

Compute SHAP values for your tree-based models using the TreeSHAP algorithm
https://modeloriented.github.io/treeshap/
GNU General Public License v3.0
79 stars 23 forks source link

missing decision types #28

Open pecto2020 opened 1 year ago

pecto2020 commented 1 year ago

I was trying to create a unified lightgbm. I've fit the model using the tidymodels framework. Unfortunately I got this error: Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type. My understing is that there is a problem in decision_type. Checkig the model I've noticed that there are thousands of missing value in the decision type column...Any idea of why decisions are missing and how to solve the issue?

krzyzinskim commented 1 year ago

Missing values are expected in this column as they occur for every leaf node, so it is unlikely that this is the cause.

However, I wasn't able to reproduce this error using tidymodels framework. But please note that an object of class lgb.Booster must be provided to the lightgbm.unify function (this can be extracted with the extract_fit_engine() function, see here). If this is not the solution, please provide a reproducible example for such an error.

cgoo4 commented 1 month ago

I get this error too and have been able to reproduce it with a toy example.

If the step_dummy() line is uncommented, then it works.

lightgbm does though support categorical data without the need to dummy these variables. This introduces the decision type == where a categorical variable equals a specific value. This may be seen in the object lgb_trees which has a column showing the decision_type used after fitting the model, e.g. for the variable neighbourhood.

library(bonsai)
library(treeshap)
library(tidymodels)
library(shapviz)
library(jsonlite)

set.seed(123)
split <- initial_split(ames, prop = 0.8)
train <- training(split)
test <- testing(split)

recipe <- recipe(train) |> 
  update_role(Sale_Price, new_role = "outcome") |> 
  update_role(-has_role("outcome"), new_role = "predictor") |> 
  # step_dummy(all_nominal_predictors()) |> 
  step_zv(all_predictors()) 

spec <- 
  boost_tree(trees = 100, tree_depth = 6) |> 
  set_engine("lightgbm") |> 
  set_mode("regression")

fit <- workflow() |> 
  add_recipe(recipe) |> 
  add_model(spec) |> 
  fit(data = train)

lgb_trees <- lightgbm::lgb.model.dt.tree(extract_fit_engine(fit))

data <- recipe |>
  prep() |> 
  bake(train |> slice_sample(n = 100), has_role("predictor"))

x <- recipe |>
  prep() |>
  bake(test, has_role("predictor"))

shap <- extract_fit_engine(fit) |> 
  unify(data, type = "numeric") 
#> Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type

Created on 2024-10-01 with reprex v2.1.1