Error using Tpot classifier in google colab that shows "No module named 'sklearn.metrics.scorer'"

koky462 commented 3 years ago

Hi all,

I have read

1.https://colab.research.google.com/gist/weixuanfu/7e58b6120929a10a53f034cfb2608e85/tpot_dask_check_colab.ipynb#scrollTo=Gz0BsqZki2t0

https://github.com/EpistasisLab/tpot/issues/1095

to implement TPOT in google colab.

However, my code get this import error "No module named 'sklearn.metrics.scorer'"

My code

!pip install TPOT
!pip install dask==2.20.0 dask-glm==0.2.0 dask-ml==1.0.0
!pip install tornado==5.0
!pip install distributed==2.2.0
!pip install xgboost==0.90
!pip install fsspec

from dask.distributed import Client
client = Client(processes=False) 

import time
from tpot import TPOTClassifier

start = time.time()

# Assign the values outlined to the inputs
number_generations = 4
population_size = 4
offspring_size = 3
scoring_function = 'roc_auc'

# Create the tpot classifier
tpot_clf = TPOTClassifier(generations=number_generations, population_size=population_size,
                          offspring_size=offspring_size, scoring=scoring_function,
                          verbosity=2, random_state=0,config_dict='TPOT light', cv=5, warm_start=True,use_dask=True)

tpot_clf.fit(X, y)

print(tpot_clf.fitted_pipeline_)

tpot_clf.export('tpot_exported_pipeline.ipyb')

files.download('tpot_exported_pipeline.ipyb')

end = time.time()

print(end - start)

My error

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py:63: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(*args, **kwargs)
Optimization Progress: 0%
0/16 [00:00<?, ?pipeline/s]
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tpot/gp_deap.py in _wrapped_cross_val_score(sklearn_pipeline, features, target, cv, scoring_function, sample_weight, groups, use_dask)
    424         try:
--> 425             import dask_ml.model_selection  # noqa
    426             import dask  # noqa

12 frames
/usr/local/lib/python3.6/dist-packages/dask_ml/model_selection/__init__.py in <module>()
      5 """
----> 6 from ._hyperband import HyperbandSearchCV
      7 from ._incremental import IncrementalSearchCV

/usr/local/lib/python3.6/dist-packages/dask_ml/model_selection/_hyperband.py in <module>()
     10 
---> 11 from ._incremental import BaseIncrementalSearchCV
     12 from ._successive_halving import SuccessiveHalvingSearchCV

/usr/local/lib/python3.6/dist-packages/dask_ml/model_selection/_incremental.py in <module>()
     15 from sklearn.base import clone
---> 16 from sklearn.metrics.scorer import check_scoring
     17 from sklearn.model_selection import ParameterGrid, ParameterSampler

ModuleNotFoundError: No module named 'sklearn.metrics.scorer'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    827                     per_generation_function=self._check_periodic_pipeline,
--> 828                     log_file=self.log_file_,
    829                 )

/usr/local/lib/python3.6/dist-packages/tpot/gp_deap.py in eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar, stats, halloffame, verbose, per_generation_function, log_file)
    227 
--> 228     population[:] = toolbox.evaluate(population)
    229 

/usr/local/lib/python3.6/dist-packages/tpot/base.py in _evaluate_individuals(self, population, features, target, sample_weight, groups)
   1552                             for sklearn_pipeline in sklearn_pipeline_list[
-> 1553                                 chunk_idx : chunk_idx + chunk_size
   1554                             ]

/usr/local/lib/python3.6/dist-packages/tpot/base.py in <listcomp>(.0)
   1551                             )
-> 1552                             for sklearn_pipeline in sklearn_pipeline_list[
   1553                                 chunk_idx : chunk_idx + chunk_size

/usr/local/lib/python3.6/dist-packages/stopit/utils.py in wrapper(*args, **kwargs)
    144                     # ``result`` may not be assigned below in case of timeout
--> 145                     result = func(*args, **kwargs)
    146                 return result

/usr/local/lib/python3.6/dist-packages/tpot/gp_deap.py in _wrapped_cross_val_score(sklearn_pipeline, features, target, cv, scoring_function, sample_weight, groups, use_dask)
    429             msg = "'use_dask' requires the optional dask and dask-ml depedencies.\n{}".format(e)
--> 430             raise ImportError(msg)
    431 

ImportError: 'use_dask' requires the optional dask and dask-ml depedencies.
No module named 'sklearn.metrics.scorer'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-201-63a9e7598fe2> in <module>()
     14                           verbosity=2, random_state=0,config_dict='TPOT light', cv=5, warm_start=True,use_dask=True)
     15 
---> 16 tpot_clf.fit(X, y)
     17 
     18 print(tpot_clf.fitted_pipeline_)

/usr/local/lib/python3.6/dist-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    861                     # raise the exception if it's our last attempt
    862                     if attempt == (attempts - 1):
--> 863                         raise e
    864             return self
    865 

/usr/local/lib/python3.6/dist-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    852                         self._pbar.close()
    853 
--> 854                     self._update_top_pipeline()
    855                     self._summary_of_best_pipeline(features, target)
    856                     # Delete the temporary cache before exiting

/usr/local/lib/python3.6/dist-packages/tpot/base.py in _update_top_pipeline(self)
    960             # need raise RuntimeError because no pipeline has been optimized
    961             raise RuntimeError(
--> 962                 "A pipeline has not yet been optimized. Please call fit() first."
    963             )
    964 

RuntimeError: A pipeline has not yet been optimized. Please call fit() first.

I have also installed the optional dependencies of dask

pip install dask-ml[xgboost]    # also install xgboost and dask-xgboost
pip install dask-ml[complete]   # install all optional dependencies

https://ml.dask.org/install.html

But it still returns the same error, please advice. Thanks!

rachitk commented 3 years ago

This may be because scikit-learn 0.24 is installed, and the most recent update made some breaking changes to the API that your current install of dask-ml will need to address (see #1176).

To fix this, you could update dask-ml (which seems to fix this issue in later versions), or you could add the line (not recommended)

!pip install 'scikit-learn>=0.22.0,<0.24.0' --force-reinstall

after your other pip installs to force an install of an older version of scikit-learn that doesn't have these changes.

EDIT: A previous version of this comment mistakenly attributed the error to TPOT - upon closer reading of this, this is actually an issue with a dask-ml function - you should check your dask-ml version to see if there is an updated one.

kb2010 commented 3 years ago

Having same issue. tried force-reinstall, installed dask per website. down grading the scikit-learn package just causes errors about deprecated module. bit stuck on this. Using Anaconda on local pc with Jupyter notebook.

EpistasisLab / tpot