giuseppec / iml

iml: interpretable machine learning R package
https://giuseppec.github.io/iml/
Other
491 stars 88 forks source link

FeatureImp + mlr3 Learner that predicts probabilities does not work #134

Open giuseppec opened 4 years ago

giuseppec commented 4 years ago

If I want to compute the importance for a measure based on probabilities (e.g., brier score), the FeatureImp is never calculated on the probabilities, even if I manually use a predict.function:

library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()

# write a measure that just prints the `predicted` that will be used to calculate the measure
measure_print_predicted = function(actual, predicted) {
  cat(head(predicted)) # have a look at how predicted looks like
}

pred = Predictor$new(model, data = data, y = "credit_risk")
imp = FeatureImp$new(pred, loss = measure_print_predicted, n.repetitions = 1)
# 1 2 1 2 2 1

It seems that internally the class is converted as numeric values (1 and 2), which makes it impossible to compute measures based on probabilities. I then tried to directly use a manually written predict.function which also did not work:

# use a manually written predict function that returns probabilities
predict_good_prob = function(model, newdata) predict(model, newdata, predict_type = "prob")[, "good"]
head(predict_good_prob(model, data))

# use this predict function for IML method
pred = Predictor$new(model, data = data, y = "credit_risk", predict.function = predict_good_prob)
imp = FeatureImp$new(pred, loss = measure_print_predicted, n.repetitions = 1)
# 1 2 1 2 2 1
giuseppec commented 4 years ago

Ah, I need to define the positve class, I could solve the issue using

pred = Predictor$new(model, data = data.train, y = "credit_risk", class = "good")

Maybe printing an error message could help here? EDIT: there seem to be further issues, see below https://github.com/christophM/iml/issues/134#issuecomment-666151055

pat-s commented 4 years ago

Thanks for reporting!

The underlying issue is that we do not have information about the task type in Predictor$new() and all subsequent steps since only the model is supplied by the user.

I've tried to partially address this in #137 by checking if the supplied learner has attributes of a classification learner. I only did so for mlr3 learners right now though.

This should probably be tackled in a more robust fashion but it does the job for now.

pat-s commented 4 years ago

I was wrong, class should not be used in any occasion with 0/1 classif tasks, whether these are prob or response with respect to predict_type. It is for multiclass tasks.

What is correct is that {iml} by default does not know about the predict_type. The user can pass it via argument type when creating the Predictor.

Also your loss function does not seem to be suited here? With the default "ce" I got your example executing fine.


    library("mlr3")
    library("iml")
    credit.task = tsk("german_credit")
    lrn = lrn("classif.rpart", predict_type = "prob")
    model = lrn$train(credit.task)
    data = credit.task$data()

    pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob")
    FeatureImp$new(pred, loss = "ce", n.repetitions = 1)
    #> Interpretation method:  FeatureImp 
    #> error function: ce
    #> 
    #> Analysed predictor: 
    #> Prediction task: classification 
    #> Classes:  
    #> 
    #> Analysed data:
    #> Sampling from data.frame with 1000 rows and 20 columns.
    #> 
    #> Head of results:
    #>          feature importance.05 importance importance.95 permutation.error
    #> 1         status      1.517241   1.517241      1.517241             0.308
    #> 2       duration      1.295567   1.295567      1.295567             0.263
    #> 3         amount      1.157635   1.157635      1.157635             0.235
    #> 4 credit_history      1.137931   1.137931      1.137931             0.231
    #> 5        purpose      1.098522   1.098522      1.098522             0.223
    #> 6        savings      1.098522   1.098522      1.098522             0.223

<sup>Created on 2020-07-29 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0.9001)</sup>
giuseppec commented 4 years ago

I see, type = "prob" makes sure that probabilities are used but actual will still be a factor? To compute the ce of course probabilities are not necessary. Your code won't work if you use loss = "mse" (which should be the same as the brier score in case of probabilities) and also does not work if I use loss = "logLoss" because actual is a factor?

Example:

library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()

brier = function(actual, predicted) {
  sum((actual - predicted)^2)
}

pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob", class = "good")
FeatureImp$new(pred, loss = brier, n.repetitions = 1)

## Fehler in if (self$original.error == 0 & self$compare == "ratio") { : 
##  Fehlender Wert, wo TRUE/FALSE nötig ist
## Zusätzlich: Warnmeldung:
## In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors

To unterstand why this happens, let's look at the values actual and predicted that a measure will have access to:

library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()

measure_print = function(actual, predicted) {
  cat(head(actual), fill = T)
  cat(head(predicted), fill = T)
}

pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob", class = "good")
FeatureImp$new(pred, loss = measure_print, n.repetitions = 1)

## 1 2 1 1 2 1
## 0.8767123 0.1388889 0.868709 0.379562 0.379562 0.868709
## Fehler in if (self$original.error == 0 & self$compare == "ratio") { : 
##  Argument hat Länge 0

# PS: if I don't use class = "good", the value of "actual" from the measure is still a factor:
pred = Predictor$new(model, data = data, y = "credit_risk", type = "prob")
FeatureImp$new(pred, loss = measure_print, n.repetitions = 1)

## 1 2 1 1 2 1
## 1 2 1 2 2 1
## Fehler in if (self$original.error == 0 & self$compare == "ratio") { : 
##  Argument hat Länge 0
pat-s commented 4 years ago

Thanks. Yes something bad is happening internally.

But in any case, class should not be needed here (for non-multiclass tasks) and might just by chance (partly) help here.

giuseppec commented 4 years ago

Maybe if iml allows that the loss argument can also be a mlr3measure (only when model is a mlr3 model), we might get rid of these bugs because these things are already implemented in mlr3 anyway? Because now you mentioned multiclass I can probably already foresee further issues (e.g. how do I compute the importance with multiclass auc? At first glance, it does not seem to be possible to define a loss function that uses / has access to the probabilities of all classes at the same time; anyway this would be a separate issue)

christophM commented 4 years ago

Thanks for addressing this.

It is quite difficult to capture many possible losses and also the different types of outcomes (regression, binare classification, multiclass, probabilities). For the actual value of the prediction, the raw y is taken as provided. Currently if the users want something different than just the factor (and probabilities instead), they have to provide the y in a different form in Predictor$new(). So for the Brier score example this would be:

library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()

brier = function(actual, predicted) {
  sum((actual - predicted)^2)
}

y = 1 * (credit.task$data()$credit_risk == "good")

pred = Predictor$new(model, data = data, y = y, type = "prob", class = "good")
FeatureImp$new(pred, loss = brier, n.repetitions = 1)
christophM commented 4 years ago

I am not sure how to improve the situation while still allowing very general settings for the loss function. Especially for user-provided loss functions, we have no way of knowing how actual target has to look like (factor, 0/1 coding, multi-class 0/1, ...)

Maybe having some more examples would have helped in the help file?

giuseppec commented 4 years ago

My problem was that I passed my own predict.fun which was ignored completely:

library("mlr3")
library("iml")
credit.task = tsk("german_credit")
lrn = lrn("classif.rpart", predict_type = "prob")
model = lrn$train(credit.task)
data = credit.task$data()

# print actual and predicted
measure_print = function(actual, predicted) {
  cat(head(actual), fill = T)
  cat(head(predicted), fill = T)
}

# use a manually written predict function that returns probabilities
predict_good_prob = function(model, newdata) predict(model, newdata, predict_type = "prob")[, "good"]
head(predict_good_prob(model, data))
# [1] 0.8767123 0.1388889 0.8687090 0.3795620 0.3795620 0.8687090

pred = Predictor$new(model, data = data, y = "credit_risk", predict.function = predict_good_prob)
imp = FeatureImp$new(pred, loss = measure_print, n.repetitions = 1)
# 1 2 1 1 2 1
# 1 2 1 2 2 1
# Fehler in if (self$original.error == 0 & self$compare == "ratio") { : 
#  Argument hat Länge 0

Usually, the user knows how actual looks like (its just the target column of the data), right? Also the user should know how the output of the predict function looks like (yes, different models sometimes output different things here, sometimes matrices, sometimes vectors etc.). In the code above, I was suprised that my self-written predict function was ignored, since the measure function seems to have access only to the factor (see output above). That is, I had no chance to access the probabilities of the class "good" nor the probabilities of "bad" in the measure function.

In case of multiclass I could have written a predict function that passes a matrix of all probabilities for each class, e.g.:

predict_good_prob = function(model, newdata) predict(model, newdata, predict_type = "prob")
head(predict_good_prob(model, data))
#           good       bad
# [1,] 0.8767123 0.1232877
# [2,] 0.1388889 0.8611111
# [3,] 0.8687090 0.1312910
pred = Predictor$new(model, data = data, y = "credit_risk", predict.function = predict_good_prob)

Then, I'd expect that I can reuse this matrix of probabilities in the measure

# print actual and predicted
my_cool_measure = function(actual, predicted) {
 class1 = predicted[,"good"]
 class2 = predicted[,"bad"]
# do some cool computations with probabilities of each class
}