TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.76k stars 334 forks source link

Using PermutationImportance with Sklearn RFECV #383

Open apptimise opened 4 years ago

apptimise commented 4 years ago

I would like to use RFECV scores to be able to compare the R2 scores for different numbers of features selected based on PermutationImportance importances. My solution is as follows:

from eli5.sklearn import PermutationImportance
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.model_selection import KFold
from sklearn.svm import SVR

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

splitter = KFold(n_splits=3) 

estimator = SVR(kernel="linear")
selector = RFECV(
    PermutationImportance(estimator,  scoring='r2', n_iter=10, random_state=42, cv=splitter),
    cv=splitter,
    scoring='r2',
    step=1
)
selector = selector.fit(X, y)
print(selector.grid_scores_)
  1. My understanding is that RFECV creates 3 folds and then PermutationImportance also creates 3 folds within each fold. Is that the case? The ideal situation is that PermutationImportance uses the same split as provided by RFECV, fits on the train data and provides the results for the validation set. Or maybe this is exactly what is happening in this case. Can someone confirm this, please?

  2. Also, do I need to worry about the refit parameter? I'm not exactly sure what it does

t10-13rocket commented 3 years ago

Can it be hidden in plain sight?