I would like to use RFECV scores to be able to compare the R2 scores for different numbers of features selected based on PermutationImportance importances. My solution is as follows:
from eli5.sklearn import PermutationImportance
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.model_selection import KFold
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
splitter = KFold(n_splits=3)
estimator = SVR(kernel="linear")
selector = RFECV(
PermutationImportance(estimator, scoring='r2', n_iter=10, random_state=42, cv=splitter),
cv=splitter,
scoring='r2',
step=1
)
selector = selector.fit(X, y)
print(selector.grid_scores_)
My understanding is that RFECV creates 3 folds and then PermutationImportance also creates 3 folds within each fold. Is that the case? The ideal situation is that PermutationImportance uses the same split as provided by RFECV, fits on the train data and provides the results for the validation set. Or maybe this is exactly what is happening in this case. Can someone confirm this, please?
Also, do I need to worry about the refit parameter? I'm not exactly sure what it does
I would like to use
RFECV
scores to be able to compare the R2 scores for different numbers of features selected based onPermutationImportance
importances. My solution is as follows:My understanding is that
RFECV
creates 3 folds and thenPermutationImportance
also creates 3 folds within each fold. Is that the case? The ideal situation is thatPermutationImportance
uses the same split as provided byRFECV
, fits on the train data and provides the results for the validation set. Or maybe this is exactly what is happening in this case. Can someone confirm this, please?Also, do I need to worry about the
refit
parameter? I'm not exactly sure what it does