TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.76k stars 334 forks source link

PermutationImportance does not allow NaNs #380

Closed Matgrb closed 4 years ago

Matgrb commented 4 years ago

In the current version, eli5 does not allow NaNs in X. However, some models handle NaNs very well e.g. XGBClassifier or HistGradientBoostingClassifier from sklearn.

The following code allows to reproduce the issue:

import numpy as np
from eli5.sklearn import PermutationImportance

from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier

X, y = load_iris(return_X_y=True)
X[0,0] = np.nan

perm = PermutationImportance(HistGradientBoostingClassifier(), cv=5)
perm.fit(X, y)

I suppose PermutationImportance should allow for NaNs, since if the used classifier does not allow for it, it will throw the same error anyway.

I would change the following line in eli5.sklearn.permutation_importance:

197:   X = check_array(X)

into:

197:   X = check_array(X, force_all_finite='allow-nan')

This would require increasing requirements sklearn to 0.20.0

Other way would be without modifying requirements

197:   X = check_array(X, force_all_finite=False)

As part of the issue, I would also like to implement a quick unit test.

Matgrb commented 4 years ago

Just found now that it is a duplicate of #262