TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.76k stars 334 forks source link

PermutationImportance uses CV splitter indexes incorrectly #411

Open zeromh opened 3 years ago

zeromh commented 3 years ago

The error is in PermutationImportance.fit when using cv=KFold or some other sklearn splitter.

sklearn splitters return the location indexes (i.e. iloc) of the rows, whereas PermutationImportance.fit is treating these as label indexes (i.e. loc).

This only gives the correct result when the index labels are the same as the index locations (e.g. when you have the default RangeIndex). If the dataframe has any other index, this will use the wrong splits or create a KeyError.