DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

learning curve visualizer for catboost automl using Pipelines #1283

Closed dbrami closed 2 years ago

dbrami commented 2 years ago

Describe the issue I'm getting "TypeError: ContribEstimator.init() got an unexpected keyword argument 'memory'" while trying to plot a learning curve

I have emailed via list serve but there is no code formatting so question unreadable (like my markdown here :/ )

@DistrictDataLabs/team-oz-maintainers The following code works:

from yellowbrick.classifier import ROCAUC
from yellowbrick.contrib.wrapper import wrap

model = wrap(pipeline)
visualizer = ROCAUC(model)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()

But the following does not:

from yellowbrick.model_selection import LearningCurve

# Create the learning curve visualizer
#cv = StratifiedKFold(n_splits=12)
sizes = np.linspace(0.3, 1.0, 10)

# Instantiate the classification model and visualizer
#model = MultinomialNB()
visualizer = LearningCurve(
    model, scoring='f1_weighted', train_sizes=sizes)

visualizer.fit(X, y)        # Fit the data to the visualizer
visualizer.show()           # Finalize and render the figure

With visualizer fit or show causing issue:


---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
File ~/miniconda3/envs/ML/lib/python3.10/site-packages/joblib/parallel.py:862, in Parallel.dispatch_one_batch(self, iterator)
    861 try:
--> 862     tasks = self._ready_batches.get(block=False)
    863 except queue.Empty:
    864     # slice the iterator n_jobs * batchsize items at a time. If the
    865     # slice returns less than that, then the current batchsize puts
   (...)
    868     # accordingly to distribute evenly the last items between all
    869     # workers.

File ~/miniconda3/envs/ML/lib/python3.10/queue.py:168, in Queue.get(self, block, timeout)
    167     if not self._qsize():
--> 168         raise Empty
    169 elif timeout is None:

Empty: 

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In [49], line 1
----> 1 visualizer.fit(X, y)        # Fit the data to the visualizer
      2 visualizer.show()

File ~/miniconda3/envs/ML/lib/python3.10/site-packages/yellowbrick/model_selection/learning_curve.py:249, in LearningCurve.fit(self, X, y)
    233 sklc_kwargs = {
    234     key: self.get_params()[key]
    235     for key in (
   (...)
    245     )
    246 }
    248 # compute the learning curve and store the scores on the estimator
--> 249 curve = sk_learning_curve(self.estimator, X, y, **sklc_kwargs)
    250 self.train_sizes_, self.train_scores_, self.test_scores_ = curve
    252 # compute the mean and standard deviation of the training data

File ~/miniconda3/envs/ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:1558, in learning_curve(estimator, X, y, groups, train_sizes, cv, scoring, exploit_incremental_learning, n_jobs, pre_dispatch, verbose, shuffle, random_state, error_score, return_times, fit_params)
   1555     for n_train_samples in train_sizes_abs:
   1556         train_test_proportions.append((train[:n_train_samples], test))
-> 1558 results = parallel(
   1559     delayed(_fit_and_score)(
   1560         clone(estimator),
   1561         X,
   1562         y,
   1563         scorer,
   1564         train,
   1565         test,
   1566         verbose,
   1567         parameters=None,
   1568         fit_params=fit_params,
   1569         return_train_score=True,
   1570         error_score=error_score,
   1571         return_times=return_times,
   1572     )
   1573     for train, test in train_test_proportions
   1574 )
   1575 results = _aggregate_score_dicts(results)
   1576 train_scores = results["train_scores"].reshape(-1, n_unique_ticks).T

File ~/miniconda3/envs/ML/lib/python3.10/site-packages/joblib/parallel.py:1085, in Parallel.__call__(self, iterable)
   1076 try:
   1077     # Only set self._iterating to True if at least a batch
   1078     # was dispatched. In particular this covers the edge
   (...)
   1082     # was very quick and its callback already dispatched all the
   1083     # remaining jobs.
   1084     self._iterating = False
-> 1085     if self.dispatch_one_batch(iterator):
   1086         self._iterating = self._original_iterator is not None
   1088     while self.dispatch_one_batch(iterator):

File ~/miniconda3/envs/ML/lib/python3.10/site-packages/joblib/parallel.py:873, in Parallel.dispatch_one_batch(self, iterator)
    870 n_jobs = self._cached_effective_n_jobs
    871 big_batch_size = batch_size * n_jobs
--> 873 islice = list(itertools.islice(iterator, big_batch_size))
    874 if len(islice) == 0:
    875     return False

File ~/miniconda3/envs/ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:1560, in <genexpr>(.0)
   1555     for n_train_samples in train_sizes_abs:
   1556         train_test_proportions.append((train[:n_train_samples], test))
   1558 results = parallel(
   1559     delayed(_fit_and_score)(
-> 1560         clone(estimator),
   1561         X,
   1562         y,
   1563         scorer,
   1564         train,
   1565         test,
   1566         verbose,
   1567         parameters=None,
   1568         fit_params=fit_params,
   1569         return_train_score=True,
   1570         error_score=error_score,
   1571         return_times=return_times,
   1572     )
   1573     for train, test in train_test_proportions
   1574 )
   1575 results = _aggregate_score_dicts(results)
   1576 train_scores = results["train_scores"].reshape(-1, n_unique_ticks).T

File ~/miniconda3/envs/ML/lib/python3.10/site-packages/sklearn/base.py:88, in clone(estimator, safe)
     86 for name, param in new_object_params.items():
     87     new_object_params[name] = clone(param, safe=False)
---> 88 new_object = klass(**new_object_params)
     89 params_set = new_object.get_params(deep=False)
     91 # quick sanity check of the parameters of the clone

TypeError: ContribEstimator.__init__() got an unexpected keyword argument 'memory'

My model is generated a scikit-learn pipeline using FLAML. I'm doing multi-class classification and best estimator is catboost.

rebeccabilbro commented 2 years ago

Hello @dbrami and thank you for reaching out!

To help you, we ask that you provide

  1. A reproducible example of the visualizer error with Python code that we can run locally to reproduce the error. For instance, you might use one of the yellowbrick datasets so that we have the same values for X and y as you. Please be sure to include all the required import statements (e.g. import numpy as np) and double-check the commented-out lines of code -- some of those lines seem like they should be uncommented.
  2. Information about your operating system [e.g. Windows, macOS], your Python Version [e.g. 2.7, 3.6, miniconda], and your Yellowbrick Version [e.g. 0.7]

Note - as for the markdown formatting - it looks like there were only 2 backticks (instead of the needed 3 backticks) in a few places, which were the cause of your rendering error. Hope this helps!

rebeccabilbro commented 2 years ago

Hi there - we're going to close this issue out as it has gone stale. Feel free to reopen if there are updates!