dandls / counterfactuals

counterfactuals: An R package for Counterfactual Explanation Methods
https://dandls.github.io/counterfactuals/
GNU Lesser General Public License v3.0
21 stars 4 forks source link

Classification tasks are sometimes not recognized #29

Open andreash0 opened 1 year ago

andreash0 commented 1 year ago

I haven't had time to investigate this in detail, but it seems that there are cases where our classification methods do not correctly recognize classification tasks:

library(counterfactuals)
library(iml)
library(dplyr)
library(tidymodels)

data(german, package = "rchallenge")

credit = german[, c("duration", "amount", "purpose", "age",
                    "employment_duration", "housing", "number_credits", "credit_risk")]

x_interest = credit[998L,]
rf = rand_forest(mode = "classification", engine = "randomForest") %>%
  fit(credit_risk ~ ., data = credit[-998L,])

pred = Predictor$new(model = rf, data = credit[-998L,], y = "credit_risk")
nice_classif = MOCClassif$new(pred)
#> Error in super$initialize(predictor, lower, upper, distance_function): MOCClassif only works for classification tasks.

However, if we explicitely set the prob argument in the Predictor$init() method, it works:

library(counterfactuals)
library(iml)
library(dplyr)
library(tidymodels)
data(german, package = "rchallenge")

credit = german[, c("duration", "amount", "purpose", "age",
                    "employment_duration", "housing", "number_credits", "credit_risk")]

x_interest = credit[998L,]
rf = rand_forest(mode = "classification", engine = "randomForest") %>%
  fit(credit_risk ~ ., data = credit[-998L,])

pred = Predictor$new(model = rf, data = credit[-998L,], y = "credit_risk", type = "prob")
nice_classif = MOCClassif$new(pred)
dandls commented 1 year ago

This seems to be an issue of the iml Package. If pred$task == "unknown", we call these two lines. Within iml, the function iml:::inferTaskFromPrediction() is called, which identifies the task falsely as a regression task and then `pred$task = "regression" is set. This triggers the "MOCClassif only works for classification" error.

Here is an example without using the counterfactuals package:

library(iml)
library(tidymodels)

data(german, package = "rchallenge")
credit = german[, c("duration", "amount", "purpose", "age",
  "employment_duration", "housing", "number_credits", "credit_risk")]

# tidymodels
rf = rand_forest(mode = "classification", engine = "randomForest") %>%
  fit(credit_risk ~ ., data = credit)

pred = Predictor$new(model = rf, data = credit, y = "credit_risk")
pred$task
#> [1] "unknown"
pred$task = NULL
pred$predict(credit[c(1, 2),])
pred$task
#> [1] "regression"

iml:::inferTaskFromPrediction(prediction = pred$predict(credit[c(1, 2),]))
#> [1] "regression"