david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
192 stars 38 forks source link

Both cross_validate and GridSearchCV failed for 'Comparing Isolation Forest implementations' #66

Closed jrwells-research closed 1 month ago

jrwells-research commented 1 month ago

At the start of the experiments:

From 'In [4]:'

cv_res = cross_validate(IsolationForestIsoTree(), X, y, scoring="roc_auc",
                        cv=StratifiedKFold(n_splits=10, shuffle=True, random_state=1))

and 'In [5]:'

cv_model = GridSearchCV(estimator=IsolationForestIsoTree(ntrees=100, ndim=2,
                                                         missing_action="fail",
                                                         random_seed=1),
                        param_grid=params_try,
                        scoring="roc_auc", refit=True,
                        cv=StratifiedKFold(n_splits=10, shuffle=True, random_state=1))

Both failed with the following error:

ValueError: IsolationForest should either be a classifier to be used with response_method=decision_function or the response_method should be 'predict'. Got a regressor with response_method=decision_function instead.

I did find a solution by changing 'scoring="roc_auc"' to 'scoring=roc_auc_fixed' where 'roc_auc_fixed' is a variable declared as follows:

roc_auc_fixed = make_scorer(roc_auc_score, response_method="predict")

Looking at the class declaration and wondering if it should use the following:

class IsolationForest(OutlierMixin, BaseEstimator):

or

class IsolationForest(ClassifierMixin, BaseEstimator):

instead of

class IsolationForest(BaseEstimator):

Using PyCharm (Pro) as IDE. Just running in standard 'console' mode (i.e., not using any notebooks)

Version:

david-cortes commented 1 month ago

Thanks for the bug report. Should be fixed now:

pip install -U isotree