giuseppec / iml

iml: interpretable machine learning R package
https://giuseppec.github.io/iml/
Other
492 stars 87 forks source link

ALE plots: How does argument `grid.size` effect the results? #107

Open pat-s opened 4 years ago

pat-s commented 4 years ago

Why is the length of the resulting DF per feature so different when setting grid.size = 99?

I was not able to relate the setting to the actual outcome differences by reading ?FeatureEffect.

library(iml)
library(rpart)

data("Boston", package  = "MASS")
rf = rpart(medv ~ ., data = Boston)
mod = Predictor$new(rf, data = Boston)

# Compute the accumulated local effects for all features
eff = FeatureEffects$new(mod, grid.size = 99)

purrr::map_int(eff$results, nrow)
#>    crim      zn   indus    chas     nox      rm     age     dis     rad     tax 
#>     100      20      50       2      62     100      92     100       9      45 
#> ptratio   black   lstat 
#>      37      77     100

Created on 2020-01-15 by the reprex package (v0.3.0.9001)

Edit: The following code sets the grid

https://github.com/christophM/iml/blob/54b2ce26d8d13f9a6fcd635ee00c8d4835b2cad3/R/FeatureEffect-ale.R#L17-L17

and in more detail this one

https://github.com/christophM/iml/blob/54b2ce26d8d13f9a6fcd635ee00c8d4835b2cad3/R/utils.R#L191-L192

So essentially quantile(type = 1) is called with probs being a seq with length.out set by grid.size.

I wonder if this could make it into the argument description in the help page? Maybe one could also include the motivation for type = 1.

The reason for the differing outcomes shown above is then caused by

https://github.com/christophM/iml/blob/54b2ce26d8d13f9a6fcd635ee00c8d4835b2cad3/R/FeatureEffect-ale.R#L16-L17

which removes duplicated values from the quantile() output.

Regarding interpretation: Does the differing number of unique values for these features introduce a bias when interpreting the ALE plots for the specific predictors? Or is it like "20 is fine, everything greater is better but there is no bias when comparing the ALE plots of these features.".

christophM commented 4 years ago

It's implicitly the max(grid.size, unique(quantiles)) as you described.

I think that this behavior should be fine, since when many values are clustered at certain point, you just need fewer intervals. But I guess it would make sense to add this to the docs.

For the type=1, I am not entirely sure why I set it like this.