flennerhag / mlens

ML-Ensemble – high performance ensemble learning
http://ml-ensemble.com
MIT License
843 stars 108 forks source link

Error when using sklearn StratifiedKFold in Evaluator CV #135

Open oberlage opened 3 years ago

oberlage commented 3 years ago

Hi there,

First of all, thanks for providing this nice library, it's really helpful in our project!

We are implementing the Evaluator class to do a grid search but our data needs stratification. We were happy to read in the documentation that the Evaluator class also accepts "a KFold class that obeys the Scikit-learn API". This would allow us to use the sklearn.model_selection.StratifiedKFold class and easily stratify our data in the cross validation.

However, when implementing this, we get the following error:

[MLENS] backend: threading
Traceback (most recent call last):
  File "mlens_kfol_cv.py", line 29, in <module>
    n_iter=10
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/model_selection/model_selection.py", line 492, in fit
    self._fit(X, y, job)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/model_selection/model_selection.py", line 180, in _fit
    manager.process(self, job, X, y)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/parallel/backend.py", line 855, in process
    caller.indexer.fit(self.job.predict_in, self.job.targets, self.job.job)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/index/fold.py", line 147, in fit
    check_full_index(n, self.folds, self.raise_on_exception)
  File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/index/_checks.py", line 19, in check_full_index
    "type(%s) was passed." % type(folds))
ValueError: 'folds' must be an integer. type(<class 'sklearn.model_selection._split.KFold'>) was passed.   

The error seems to contradict the documentation of the Evalutator class.

The error can be reproduced with the following (dummy) code:

import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import mean_absolute_error
from mlens.model_selection import Evaluator
from mlens.metrics import make_scorer
from sklearn.linear_model import Lasso
from scipy.stats import uniform

scorer = make_scorer(mean_absolute_error, greater_is_better=False)
estimators = [('lasso',Lasso())]
param_dicts = {
    'lasso':
        {'alpha': uniform(1e-6, 1e-5)},
}

x_train = np.random.rand(10,1)
y_train = np.random.rand(10)

evl = Evaluator(
    scorer,
    cv=StratifiedKFold(),
    verbose=5,
)
evl.fit(
    x_train, y_train,
    estimators=estimators,
    param_dicts=param_dicts,
    n_iter=10
)

We're using Python 3.7.6 with the following library versions:

mlens==0.2.3
scikit-learn==0.22.1
numpy==1.18.1
scipy==1.4.1

Do you have any insights on how to get this solved?

agartland commented 2 years ago

Were you able to implement a KFold object with mlens? I'm hoping to be able to use stratified k-fold CV for the Evaluator as well as the SuperLearner. Workarounds would be OK too! Thanks!