Open giuseppec opened 4 years ago
The bug is an empty plot for the pdp of categorical feature (while it works for the FeatureEffect
method)
Thanks!
The issue is that after calculating PDP, integer features are numeric. Then a prediction to newdata with the changed feature is tried. However, the model was trained on an integer feature and should now predict to a numeric one and fails.
Internally, get.grid()
causes the type transformation here
and more specifically in get.grid.1D
:
https://github.com/christophM/iml/blob/dcabe538715588c5f2c4de50e2eb543c335d9a9b/R/utils.R#L186-L195
Here, get.grid()
used type = "equidist"
in every case and tries to create a sequence of equally spaced values. This does not work with integers.
I'd argue that in this case we should treat integers like factors since we quite often cannot create a sequence of integers given the "large" grid.sizes (with 20 as the default) for integer variables.
Speaking of the current state, calculation of PDP is not possible and AFAICS and I am wondering that this did not come up earlier.
@christophM What do you think? Should integers be treated as nominal features here?
Fun fact: In ?seq
I see:
seq.int and the default method of seq for numeric arguments return a vector of type "integer" or "double": programmers should not rely on which.
Integers are a bit tricky, because when you have very few values, treating them as factors might be desirable, but if you have some more values numerical might be more desirable. This concerns just the visualization.
Maybe having a special case for integers in get.grid would solve the issue. For integer features with < grid.size values we take all unique values. If type="equidistant" and the unique values count > grid.size, it's a bit more difficult. Maybe we compute the grid and round to the next integers?
I am also facing the same issue. Whether there is a fix for this problem as on date?
Minimal Example (see comments for the bug)