giuseppec / iml

iml: interpretable machine learning R package
https://giuseppec.github.io/iml/
Other
489 stars 88 forks source link

Feature - Display distance from actual predicted observation on ICE plots #186

Open brshallo opened 2 years ago

brshallo commented 2 years ago

It would be nice to have option(s) that highlight the specific location of a point for an individual curve when plotting ICE. The downside with geom_rug() in this case is can't trace an observation to an individual curve.

Add points to ICE plots

Coudl use geom_point() instead to see where the actual point is for each curve on the plot. For example, I think it would be nice if setting show.data = TRUE (when method = "ice") would do this. E.g.

library("mlr")
library("ggplot2")
# data(cervical)
cervical <- readr::read_csv("https://raw.githubusercontent.com/christophM/interpretable-ml-book/master/data/cervical.csv")
set.seed(43)
cervical_subset_index = sample(1:nrow(cervical), size = 300)
cervical_subset = cervical[cervical_subset_index, ]
cervical.task = makeClassifTask(data = cervical, target = "Biopsy")
mod = mlr::train(mlr::makeLearner(cl = 'classif.randomForest', id = 'cervical-rf', predict.type = 'prob'), cervical.task)
pred.cervical = Predictor$new(mod, cervical_subset, class = "Cancer")
FeatureEffect$new(pred.cervical, "Age", method = "ice")$plot(show.data = TRUE) 

(Partial inspiration comes from 14:37 of Model Agnostic Interpretability by Ricky Tharrington.)

Adjust alpha of plots

An alternative approach would be to have an additional option that changed the alpha (e.g. adj_alpha) depending on how far a line was from the actual value of an observation, e.g. so that each line would appear fainter the further it is away from the actual value for an observation (as you slide away from location of point line appears fainter).

An advantage with this approach (over adding points) is that it wouldn't clog-up the chart with a bunch of points in cases there are many lines, but would still get across for each one where it is more or less trust worthy. This also might produce a somewhat nice aggregate effect (however figuring-out most appropriate way to modulate alpha may be non-trivial... but even a decent heuristic may be helpful).

pat-s commented 2 years ago

Hi @brshallo,

thanks for the suggestion!

Currently both Christoph and me are not really active here - hence, your input/help would be needed for new features. Would you mind creating a PR that we can look at?

PS: From the view of an mlr-dev: is there anything still holding you back using mlr3 instead of mlr? (would be interesting for us to know :))

brshallo commented 2 years ago

I don't have availability to open a PR for this right now, sorry. I'd also need to familiarize myself with R6 some. (Feel free to close and can reopen if I or y'all have time to pick it up in the future, or can just leave open.)

In response to your other question on mlr / mlr3, I primarily use tidymodels. (In terms of why, I'm a big tidyverse user so was a natural extension for me from that.)