TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.75k stars 331 forks source link

Pairing RFECV and RandomForestRegressor with non-cv Permutation Importance returns NotFittedError #422

Open enesok opened 1 year ago

enesok commented 1 year ago

I'm trying to pair RFECV with permutation importance. I would like to use the non-cv version of the permutation importance computation. I tried several approaches but I get always a NotFittedError for the estimator_func while trying to fit the rfecv. Am I missing something? Everything works fine when i pass a cv instance to the permutation function, even without fitting the estimator. Help is greatly appreciated


estimator_funct = estimator_funct.fit(extract_relevant_features, choosen_target)
estimator_funct.fit(extract_relevant_features, choosen_target)

pi = PermutationImportance(estimator_funct,  scoring='r2', n_iter=10, random_state=1)
rfecv = RFECV(
    estimator=pi,
    step=1,
    cv=cv_func,
    scoring=score,
    min_features_to_select=min_features_to_select,
)

rfecv.fit(extract_relevant_features,
          choosen_target,
          groups=extract_relevant_features.index) 

```python

The exception:
...
    rfecv.fit(extract_relevant_features,
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\feature_selection\_rfe.py", line 723, in fit
    scores = parallel(
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\feature_selection\_rfe.py", line 724, in <genexpr>
    func(rfe, self.estimator, X, y, train, test, scorer)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\feature_selection\_rfe.py", line 37, in _rfe_single_fit
    return rfe._fit(
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\feature_selection\_rfe.py", line 296, in _fit
    estimator.fit(X[:, features], y, **fit_params)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\eli5\sklearn\permutation_importance.py", line 204, in fit
    si = self._non_cv_scores_importances(X, y)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\eli5\sklearn\permutation_importance.py", line 232, in _non_cv_scores_importances
    base_score, importances = self._get_score_importances(score_func, X, y)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\eli5\sklearn\permutation_importance.py", line 236, in _get_score_importances
    return get_score_importances(score_func, X, y, n_iter=self.n_iter,
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\eli5\permutation_importance.py", line 86, in get_score_importances
    base_score = score_func(X, y)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\metrics\_scorer.py", line 219, in __call__
    return self._score(
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\metrics\_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\metrics\_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\ensemble\_forest.py", line 989, in predict
    check_is_fitted(self)
  File "C:\Users\T450\anaconda3\envs\Master\lib\site-packages\sklearn\utils\validation.py", line 1345, in check_is_fitted
    raise NotFittedError(msg % {"name": type(estimator).__name__})
sklearn.exceptions.NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.