ModelOriented / ingredients

Effects and Importances of Model Ingredients
https://modeloriented.github.io/ingredients/
GNU General Public License v3.0
37 stars 18 forks source link

CP observation original value isn't always apparent in variable splits #124

Closed hbaniecki closed 3 years ago

hbaniecki commented 3 years ago

Added the include_new_observation=True parameter in the python implementation, which adds observation variable values to variable splits.

pbiecek commented 3 years ago

ceteris_paribus has now argument variable_splits_with_obs which adds values from new_observations to variable_splits

default behaviour is not changed!

Example before/after

 model_titanic_rf <- randomForest(survived ~ gender + age + fare,
                                  data = na.omit(titanic_imputed))
 explain_titanic_rf <- explain(model_titanic_rf,
                               data = titanic_imputed[,-8],
                               y = titanic_imputed[,8],
                               verbose = FALSE)

 cp1 <- ceteris_paribus(explain_titanic_rf, titanic_imputed[c(1,2,3, 198),], grid_points=5) 
 plot(cp1) +
   show_observations(cp1)

 cp1 <- ceteris_paribus(explain_titanic_rf, titanic_imputed[c(1, 2,3, 198),], grid_points=5, variable_splits_with_obs = TRUE) 
 plot(cp1) +
   show_observations(cp1)
hbaniecki commented 3 years ago

This yields an error

cp1 <- ceteris_paribus(explain_titanic_rf, titanic_imputed[c(1, 2,3, 198),], grid_points=5, variable_splits_with_obs = TRUE) 
plot(cp1) +
  show_observations(cp1, variable_type='categorical')

Also, can we add drwhy colors to the new errorbar CP?

cp1 <- ceteris_paribus(explain_titanic_rf, titanic_imputed[c(1,2,3, 198),], grid_points=5) 
plot(cp1, variable_type='categorical') 
pbiecek commented 3 years ago

IMHO the

plot(cp1) +
  show_observations(cp1, variable_type='categorical')

should not work, as plot(cp1) plots profiles for continouse variables and show_observations are asked about categorical ones

pbiecek commented 3 years ago

I just noticed that neither version of CP profiles (bars, profiles, steps) uses drwhy colors, will change this in the next version

thanks

hbaniecki commented 3 years ago

My bad, this works (besides the drwhy palette):

library(DALEX)
library(randomForest)
library(ingredients)

model_titanic_rf <- randomForest(survived ~ gender + age + fare,
                                 data = na.omit(titanic_imputed))
explain_titanic_rf <- explain(model_titanic_rf,
                              data = titanic_imputed[,c(1,2,5)],
                              y = titanic_imputed[,8],
                              verbose = FALSE)

cp1 <- ceteris_paribus(explain_titanic_rf, titanic_imputed[c(1,2,3, 198),], grid_points=5) 

plot(cp1, variable_type='categorical') 

Thanks