EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.76k stars 1.57k forks source link

ValueError: 'RMSLE' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options. #954

Closed lmsanch closed 5 years ago

lmsanch commented 5 years ago

I created a custom function with sklearn metrics, which worked fine until I had to do a new reinstall of Anaconda and TPOT in my mac. Now, I am using tpot.version '0.9.1', python 3.7.5

The function runs well on my Ubuntu machine, so I am not sure what the problem is.

Context of the issue

def RMSLE(y, y_pred):
    """Root mean squared loss.

    Keyword arguments:
    y_true -- array containing true values
    y_pred -- predictions
    example - rmsle_loss = make_scorer(rmsle_loss, greater_is_better=False)
    """
    return (np.sqrt(mean_squared_error(y, y_pred)))

Then: rmsle_loss = make_scorer(RMSLE, greater_is_better=False)

Then I pass this parameters to TPOT:

params ={'cv':5,
         'scoring': rmsle_loss,
         'generations':10,
         'random_state':0,
         'max_eval_time_mins':10}

I get these warnings:

[17:18:21] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[17:18:21] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[17:18:21] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[17:18:22] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

Then in get this error:

Traceback (most recent call last):
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/scorer.py", line 228, in get_scorer
    scorer = SCORERS[scoring]
KeyError: 'RMSLE'

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/scorer.py", line 228, in get_scorer
    scorer = SCORERS[scoring]
KeyError: 'RMSLE'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 600, in __call__
    return self.func(*args, **kwargs)
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/stopit/utils.py", line 145, in wrapper
    result = func(*args, **kwargs)
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/tpot/gp_deap.py", line 428, in _wrapped_cross_val_score
    scorer = check_scoring(sklearn_pipeline, scoring=scoring_function)
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/scorer.py", line 272, in check_scoring
    return get_scorer(scoring)
  File "/Users/luissanchez/opt/anaconda3/lib/python3.7/site-packages/sklearn/metrics/scorer.py", line 232, in get_scorer
    'to get valid options.' % (scoring))
ValueError: 'RMSLE' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
~/opt/anaconda3/lib/python3.7/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups)
    622                     verbose=self.verbosity,
--> 623                     per_generation_function=self._check_periodic_pipeline
    624                 )

~/opt/anaconda3/lib/python3.7/site-packages/tpot/gp_deap.py in eaMuPlusLambda(population, toolbox, mu, lambda_, cxpb, mutpb, ngen, pbar, stats, halloffame, verbose, per_generation_function)
    230 
--> 231     fitnesses = toolbox.evaluate(invalid_ind)
    232     for ind, fit in zip(invalid_ind, fitnesses):

~/opt/anaconda3/lib/python3.7/site-packages/tpot/base.py in _evaluate_individuals(self, individuals, features, target, sample_weight, groups)
   1142                 tmp_result_scores = parallel(delayed(partial_wrapped_cross_val_score)(sklearn_pipeline=sklearn_pipeline)
-> 1143                                              for sklearn_pipeline in sklearn_pipeline_list[chunk_idx:chunk_idx + self.n_jobs * 4])
   1144                 # update pbar

~/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1015             with self._backend.retrieval_context():
-> 1016                 self.retrieve()
   1017             # Make sure that we get a last message telling us we are done

~/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    907                 if getattr(self._backend, 'supports_timeout', False):
--> 908                     self._output.extend(job.get(timeout=self.timeout))
    909                 else:

~/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    553         try:
--> 554             return future.result(timeout=timeout)
    555         except LokyTimeoutError:

~/opt/anaconda3/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    434             elif self._state == FINISHED:
--> 435                 return self.__get_result()
    436             else:

~/opt/anaconda3/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    383         if self._exception:
--> 384             raise self._exception
    385         else:

ValueError: 'RMSLE' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.

During handling of the above exception, another exception occurred:
RuntimeError                              Traceback (most recent call last)
.
.
.

RuntimeError: A pipeline has not yet been optimized. Please call fit() first.

For the first value error, ValueError: 'RMSLE' is not a valid scoring value. TPOT warns: Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.

If I do that, I see that my function IS among the valid functions, so I dont know what the issue is:

sorted(sklearn.metrics.SCORERS.keys())
['RMSLE',
 'accuracy',
 'adjusted_mutual_info_score',
 'adjusted_rand_score',
 'average_precision',
 'balanced_accuracy',
 'brier_score_loss',
 'completeness_score',
 'explained_variance',
 'f1',
 'f1_macro',
 'f1_micro',
 'f1_samples',
 'f1_weighted',
 'fowlkes_mallows_score',
 'homogeneity_score',
 'jaccard',
 'jaccard_macro',
 'jaccard_micro',
 'jaccard_samples',
 'jaccard_weighted',
 'max_error',
 'mutual_info_score',
 'neg_log_loss',
 'neg_mean_absolute_error',
 'neg_mean_squared_error',
 'neg_mean_squared_log_error',
 'neg_median_absolute_error',
 'normalized_mutual_info_score',
 'precision',
 'precision_macro',
 'precision_micro',
 'precision_samples',
 'precision_weighted',
 'r2',
 'recall',
 'recall_macro',
 'recall_micro',
 'recall_samples',
 'recall_weighted',
 'roc_auc',
 'v_measure_score']

I had a similar issue and Weixuan Fu suggested to install a different version of TPOT to solve the related problem.

pip install --upgrade --no-deps --force-reinstall git+https://github.com/weixuanfu/tpot.git@scoring_api

I did this in my mac (don't remember doing it in Ubuntu), and I still have the problem.

I also tried:

from sklearn.metrics import make_scorer, SCORERS
SCORERS['rmsle_loss'] = make_scorer(RMSLE, greater_is_better=False)

And then passing this to my dictionary of parameters to TPOT:

params ={'cv':5,
         'scoring': 'rmsle_loss',
         'generations':10,
         'random_state':0,
         'max_eval_time_mins':10}

Obtaining the same error: ValueError: 'rmsle_loss' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options. [the remaining entries are only necessary if you are reporting a bug]

Process to reproduce the issue

[ordered list the process to finding and recreating the issue, example below]

  1. User creates TPOT instance
  2. User calls TPOT fit() function with training data and custom function
  3. TPOT crashes with a 'RMSLE' at the beginning of the process

Expected result

to run w/o problems, like in my Ubuntu machine

Current result

[describe what you currently experience from this process, and thereby explain the bug]

Possible fix

[not necessary, but suggest fixes or reasons for the bug]

name of issue screenshot

[if relevant, include a screenshot]

lmsanch commented 5 years ago

Updated TPOT to 0.11 via: pip install --upgrade --no-deps --force-reinstall git+https://github.com/EpistasisLab/tpot.git@development Same issue with:

SCORERS['rmsle_loss'] = make_scorer(RMSLE, greater_is_better=False)
params ={'cv':5,
         'scoring': 'rmsle_loss',
         'generations':10,
         'random_state':0,
         'max_eval_time_mins':10}

I get:

ValueError: 'rmsle_loss' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.

lmsanch commented 5 years ago

I found a solution, but TPOT has a bug in scorer (Although the functions shows up in the SCORER dictionary, it is not taken by TPOT)

#SCORERS['rmsle_loss'] = make_scorer(RMSLE, greater_is_better=False)
rmsle_loss= make_scorer(RMSLE, greater_is_better=False)
params ={'cv':5,
         'scoring': rmsle_loss,
         'generations':10,
         'random_state':0,
         'max_eval_time_mins':10}
stin7 commented 4 years ago

Hi, I'm running into the same issue. What was your solution?

stin7 commented 4 years ago

I found Issue #664 which advises to use n_jobs=1. That works for me for now.

Chandrima31 commented 4 years ago

Updated TPOT to 0.11 via: pip install --upgrade --no-deps --force-reinstall git+https://github.com/EpistasisLab/tpot.git@development Same issue with:

SCORERS['rmsle_loss'] = make_scorer(RMSLE, greater_is_better=False)
params ={'cv':5,
         'scoring': 'rmsle_loss',
         'generations':10,
         'random_state':0,
         'max_eval_time_mins':10}

I get:

ValueError: 'rmsle_loss' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.

Thank you @lmsanch Your suggestion saved my day!