giuseppec / iml

iml: interpretable machine learning R package
https://giuseppec.github.io/iml/
Other
492 stars 87 forks source link

FeatureImp: Problem with binary output #194

Open snvv opened 2 years ago

snvv commented 2 years ago

The output of the FeatureImp is very sensitive to number of inputs For example using 50 or 40 explanatory variables the output of FeatureImp$new(predictor, loss = "ce") is zero. However, when the variables are 10 or 20 it produce some sensible output see a reproducible example:

library(randomForest)
library(mlbench)
library(caret)
library(e1071)
# Load Dataset
data(Sonar)
dataset <- Sonar

#10 folds repeat 3 times
control <- trainControl(method='repeatedcv', 
                        number=10, 
                        repeats=3)
#Metric compare model is Accuracy
metric <- "Accuracy"
set.seed(123)
#Number randomely variable selected is mtry
mtry <- sqrt(ncol(x))

dataset1=dataset[, c(1:40, 61)]

tunegrid <- expand.grid(.mtry=mtry)
rf_default <- caret::train(Class~., 
                    data=dataset1, 
                    method='rf', 
                    metric='Accuracy', 
                    tuneGrid=tunegrid, 
                    trControl=control)
print(rf_default)

predictor <- Predictor$new(rf_default, data = dataset1[,1:40], y = dataset1$Class)
imp <- FeatureImp$new(predictor, loss = "ce")
imp
plot(imp)

Then change dataset1=dataset[, c(1:40, 61)] to dataset1=dataset[, c(1:10, 61)] Regards #