ModelOriented / iBreakDown

Break Down with interactions for local explanations (SHAP, BreakDown, iBreakDown)
https://ModelOriented.github.io/iBreakDown/
GNU General Public License v3.0
81 stars 15 forks source link

Aggregate local_interactions to estimate shap with interactions #91

Open aruaud opened 3 years ago

aruaud commented 3 years ago

Hi, Thanks for the package! I was wondering how is the variable order set when calculating the local interactions and if there could be a way to randomize that order to repeat the measure of the contribution for different orders (and get an estimation of the contribution closer to what SHAP would output)? I tried passing different orders of variables to local_interactions(..., order =) but it does not change anything, and so I don't know if I am missing a step.. ?

Script example:

# get the variable names and interactions
tmp <- colnames(X)
tmp <- combn(tmp, m = 2)
tmp <-unlist(lapply(asplit(tmp, MARGIN = 2), paste, collapse = ':'))
varN <- c(colnames(X), tmp)

# create different orders
var_orders <- list()
for (i in 1:5){
    set.seed(i)
    var_orders[[i]] <- sample(varN)
}

# get the contributions for different orders
res <- list()
i <- 1
for (vo in var_orders){
    res[[i]] <- local_interactions(new_observation = X[1,],x = explain_rf, interaction_preference = 10, var_orders = vo)
    i <- i+1
}
hbaniecki commented 3 years ago

Hi, I have a minimal example of the change in variable order:

library("DALEX")
library("iBreakDown")
set.seed(1313)
model_titanic_glm <- glm(survived ~ .,
                         data = titanic_imputed, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,
                               data = titanic_imputed[,-8],
                               y = titanic_imputed$survived,
                               label = "glm")

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=6:1)
bd_glm

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=1:6)
bd_glm

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=c('age:gender', 'class', 'embarked', 'fare', 'sibsp'))
bd_glm

bd_glm <- local_interactions(explain_titanic_glm, titanic_imputed[1, ], order=c('age:gender', 'embarked:class', 'sibsp:fare'))
bd_glm

Estimation of SHAP by repeating contributions over different orders is possible using the shap function: https://modeloriented.github.io/iBreakDown/reference/break_down_uncertainty.html More on the topic of these methods can be found in the EMA e-book http://ema.drwhy.ai/shapley.html

aruaud commented 3 years ago

Thanks Hubert! I tried your example and it indeed works fine :) However, when passing an order with all variables and possible interactions, I do not get any interaction anymore but only the contributions of single variables. Is it that not all interactions can be passed to the function?

And thanks for pointing to the shap() function! I had been using it but could not find how to calculate SHAP values for interactions with it? This is why I switched to the local_interaction() function..

hbaniecki commented 3 years ago

I believe that each variable can be mentioned only once e.g. if 'age' is apparent, then 'age:gender' cannot be used. Additionally, I see that when passing interactions as strings, only one name convention is possible e.g. 'age:gender' not 'gender:age'.

As for SHAP with interactions, I think that it would be a great feature/method to consider.

aruaud commented 3 years ago

I see, thanks Hubert for the clarification! And so not all pairwise interactions can be assessed nor single and interactions.. That could also be a nice feature too :) Looking forward to the shap interactions!

hbaniecki commented 3 years ago

I think this could remain open