Open pecto2020 opened 1 year ago
Missing values are expected in this column as they occur for every leaf node, so it is unlikely that this is the cause.
However, I wasn't able to reproduce this error using tidymodels
framework. But please note that an object of class lgb.Booster
must be provided to the lightgbm.unify
function (this can be extracted with the extract_fit_engine()
function, see here). If this is not the solution, please provide a reproducible example for such an error.
I get this error too and have been able to reproduce it with a toy example.
If the step_dummy()
line is uncommented, then it works.
lightgbm does though support categorical data without the need to dummy these variables. This introduces the decision type ==
where a categorical variable equals a specific value. This may be seen in the object lgb_trees
which has a column showing the decision_type
used after fitting the model, e.g. for the variable neighbourhood
.
library(bonsai)
library(treeshap)
library(tidymodels)
library(shapviz)
library(jsonlite)
set.seed(123)
split <- initial_split(ames, prop = 0.8)
train <- training(split)
test <- testing(split)
recipe <- recipe(train) |>
update_role(Sale_Price, new_role = "outcome") |>
update_role(-has_role("outcome"), new_role = "predictor") |>
# step_dummy(all_nominal_predictors()) |>
step_zv(all_predictors())
spec <-
boost_tree(trees = 100, tree_depth = 6) |>
set_engine("lightgbm") |>
set_mode("regression")
fit <- workflow() |>
add_recipe(recipe) |>
add_model(spec) |>
fit(data = train)
lgb_trees <- lightgbm::lgb.model.dt.tree(extract_fit_engine(fit))
data <- recipe |>
prep() |>
bake(train |> slice_sample(n = 100), has_role("predictor"))
x <- recipe |>
prep() |>
bake(test, has_role("predictor"))
shap <- extract_fit_engine(fit) |>
unify(data, type = "numeric")
#> Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type
Created on 2024-10-01 with reprex v2.1.1
I was trying to create a unified lightgbm. I've fit the model using the tidymodels framework. Unfortunately I got this error:
Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type
. My understing is that there is a problem in decision_type. Checkig the model I've noticed that there are thousands of missing value in the decision type column...Any idea of why decisions are missing and how to solve the issue?