First of all, thanks for providing this nice library, it's really helpful in our project!
We are implementing the Evaluator class to do a grid search but our data needs stratification. We were happy to read in the documentation that the Evaluator class also accepts "a KFold class that obeys the Scikit-learn API". This would allow us to use the sklearn.model_selection.StratifiedKFold class and easily stratify our data in the cross validation.
However, when implementing this, we get the following error:
[MLENS] backend: threading
Traceback (most recent call last):
File "mlens_kfol_cv.py", line 29, in <module>
n_iter=10
File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/model_selection/model_selection.py", line 492, in fit
self._fit(X, y, job)
File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/model_selection/model_selection.py", line 180, in _fit
manager.process(self, job, X, y)
File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/parallel/backend.py", line 855, in process
caller.indexer.fit(self.job.predict_in, self.job.targets, self.job.job)
File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/index/fold.py", line 147, in fit
check_full_index(n, self.folds, self.raise_on_exception)
File "/Users/user/.python-virtualenvs/some_env/lib/python3.7/site-packages/mlens/index/_checks.py", line 19, in check_full_index
"type(%s) was passed." % type(folds))
ValueError: 'folds' must be an integer. type(<class 'sklearn.model_selection._split.KFold'>) was passed.
The error seems to contradict the documentation of the Evalutator class.
The error can be reproduced with the following (dummy) code:
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import mean_absolute_error
from mlens.model_selection import Evaluator
from mlens.metrics import make_scorer
from sklearn.linear_model import Lasso
from scipy.stats import uniform
scorer = make_scorer(mean_absolute_error, greater_is_better=False)
estimators = [('lasso',Lasso())]
param_dicts = {
'lasso':
{'alpha': uniform(1e-6, 1e-5)},
}
x_train = np.random.rand(10,1)
y_train = np.random.rand(10)
evl = Evaluator(
scorer,
cv=StratifiedKFold(),
verbose=5,
)
evl.fit(
x_train, y_train,
estimators=estimators,
param_dicts=param_dicts,
n_iter=10
)
We're using Python 3.7.6 with the following library versions:
Were you able to implement a KFold object with mlens? I'm hoping to be able to use stratified k-fold CV for the Evaluator as well as the SuperLearner. Workarounds would be OK too! Thanks!
Hi there,
First of all, thanks for providing this nice library, it's really helpful in our project!
We are implementing the
Evaluator
class to do a grid search but our data needs stratification. We were happy to read in the documentation that theEvaluator
class also accepts "a KFold class that obeys the Scikit-learn API". This would allow us to use thesklearn.model_selection.StratifiedKFold
class and easily stratify our data in the cross validation.However, when implementing this, we get the following error:
The error seems to contradict the documentation of the
Evalutator
class.The error can be reproduced with the following (dummy) code:
We're using Python 3.7.6 with the following library versions:
Do you have any insights on how to get this solved?