Open SamuelFrederick opened 1 year ago
One update: this seems to only be an issue when operating in parallel with the BART model. I do not obtain the same importance for all variables when using other models (e.g., Ranger, xgboost, GBM, etc.) in parallel or when using BART with sequential variable importance calculation. I have modified the code such that it will reproduce this issue below:
library(mlr3verse)
mlr3extralearners::install_learners("regr.bart")
#library(iml)
#library(future)
n <- 100
set.seed(123)
x1 <- rnorm(n, 4, 5)
x2 <- sample(c("a", "b","c"), size = n, replace = T)
x3 <- sample(letters[1:4], size = n, replace = T)
x4_noise <- rnorm(n, 1, 6)
y <- 3 + 2*x1 + 5*(x2=="a") - 10*(x2=="b") + 25*(x2=="c") +
4*(x3=="a") - 4*(x3=="b") + 5*(x3=="c") +10*(x3=="d") -
50*(x3=="d")*(x2=="b") +
rnorm(n, 0, 3)
df <- data.frame(x1 = x1, x2 = factor(x2),
x3 = factor(x3), x4_noise = x4_noise,
y = y)
task <- as_task_regr(df, target = "y")
gr <- po("scale") %>>% po("encode") %>>% lrn("regr.bart")
grl <- GraphLearner$new(gr)
grl$train(task)
future::plan("multisession", workers=2)
model <- iml::Predictor$new(grl, data = df, y = "y")
imp_mod <- iml::FeatureImp$new(model, loss = "rmse",
n.repetitions = 50,
compare = "ratio")
imp_mod$results
I am training several ML models using the mlr3 package and have been using iml to retrieve permutation importance for variables in my data. However, I have noticed that, for BART models, the variable importance is exactly the same for every variable. Below is code to reproduce this issue using a toy dataset. Even the variable which is completely unrelated to the outcome variable has the same variable importance as the others.
Output: