ValueError: y_true and y_pred contain different number of classes 2, 3.

PGijsbers commented 3 years ago

A TPOT.fit call may fail when there are outlier minority classes (with certain metrics).

Context of the issue

When running the benchmark we encountered this issue sometimes, for instance with evaluations on wine-quality-white: python runbenchmark.py TPOT openml/t/359974 1h8c -f 6. Because of TPOT internals, the small minority classes may cause an error when optimizing towards log loss. I reduced the issue to a minimal example:

from tpot import TPOTClassifier
import numpy as np

x, y = np.random.random((151, 4)), np.asarray([0] * 75 + [1] * 75 + [2])
t = TPOTClassifier(max_time_mins=1, scoring="neg_log_loss")
t.fit(x,y)
t.predict(x)

Expected result

I expect a pipeline to be fit regardless, and be able to produce predictions for every class (even if that means with a probability of zero and receiving a warning about it).

Current result

Running the MWE:

(venv) root@69ae78289b07:/bench# python mwe.py
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py:668: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  % (min_groups, self.n_splits)), UserWarning)
/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
  warnings.warn(label_encoder_deprecation_msg, UserWarning)
Traceback (most recent call last):
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 828, in fit
    log_file=self.log_file_,
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/gp_deap.py", line 281, in eaMuPlusLambda
    per_generation_function(gen)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 1176, in _check_periodic_pipeline
    self._update_top_pipeline()
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 931, in _update_top_pipeline
    error_score="raise",
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 450, in cross_val_score
    error_score=error_score)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 256, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 625, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, error_score)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 288, in _score
    return self._sign * self._score_func(y, y_pred, **self._kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2275, in log_loss
    lb.classes_))
ValueError: y_true and y_pred contain different number of classes 2, 3. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mwe.py", line 6, in <module>
    t.fit(x,y)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 863, in fit
    raise e
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 854, in fit
    self._update_top_pipeline()
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/tpot/base.py", line 931, in _update_top_pipeline
    error_score="raise",
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 450, in cross_val_score
    error_score=error_score)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 256, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 625, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, error_score)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 88, in __call__
    *args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/metrics/_scorer.py", line 288, in _score
    return self._sign * self._score_func(y, y_pred, **self._kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/bench/frameworks/TPOT/venv/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2275, in log_loss
    lb.classes_))
ValueError: y_true and y_pred contain different number of classes 2, 3. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1]

Possible fix

Depends on the level you want to fix it on, options include:

training a separate model for outlier detection of the super-small minority classes and fitting a pipeline for the remainder of the data
- ignoring the data for the problematic classes altogether while emitting an explicit warning about it (which is different from the current scenario where there are only scikit-learn warnings, and also lead to the error)

rachitk commented 3 years ago

@PGijsbers Thank you for submitting this issue, for the detail, and for the minimally reproducible example.

It seems that this is an issue when a class is unobserved in any of the cross-validation folds that TPOT generated (by default, it uses StratifiedKFold with 5 folds to generate the cross-validation splits). sklearn's log_loss metric will then be passed an array that is missing data for one or more of the classes.

You can reduce the number of folds performed by TPOT so that it is less than the number of instances of the smallest class or create your own cross-fold generator that ensures at least one of each class exists in the data passed when fitting the pipeline for scoring (both of these use the cv argument when instantiating TPOT). This is irrelevant with non-probability-based metrics (and TPOT will only broadcast the sklearn warnings from StratifiedKFold) as those handle missing classes appropriately.

This is an issue that occurs in native sklearn and is due to how log_loss handles missing classes (or a lack of handling thereof): see https://github.com/scikit-learn/scikit-learn/issues/11777 and https://github.com/scikit-learn/scikit-learn/issues/15389. We can attempt to write code to handle this on our end, but I'm of the opinion that it's better to leave it up to sklearn to correct these issues and to handle this within their own scoring functions.

In theory, we could eliminate/ignore sparsely-populated classes either in preprocessing or when evaluating pipelines, but as TPOT can otherwise handle cases like this and properly construct and mutate pipelines with most other metrics (for example, if you use the basic accuracy metric), this doesn't seem like the best approach to take without user input and may be something better left to the user to do, as the approach to removing outliers or handling classes with few instances will likely differ significantly based on the meta-features of the input dataset.

It is possible to handle this and use a larger number of folds without modifying the functionality of TPOT or sklearn and maintain the use of the log_loss metric. One option is to write a custom log_loss metric that essentially pads the reported probabilities with probabilities of 0 for missing classes before passing them to sklearn's log_loss. I've written a demo of this below:

from tpot import TPOTClassifier
import numpy as np
from sklearn.metrics import log_loss, make_scorer

x, y = np.random.random((151, 4)), np.asarray([0] * 75 + [1] * 75 + [2])
labels = np.unique(y)

def mod_log_loss(y_true, y_pred, labels):
    class_diff = len(labels) - len(y_pred[0])

    if(class_diff > 0):
        y_pred_pad = np.array([np.pad(x, pad_width=(0,class_diff)) for x in y_pred])
    else:
        y_pred_pad = y_pred

    return(log_loss(y_true, y_pred_pad, labels=labels))

mod_neg_log_loss = make_scorer(mod_log_loss, greater_is_better=False, labels=labels, needs_proba=True)

t = TPOTClassifier(max_time_mins=1, scoring=mod_neg_log_loss)
t.fit(x,y)
t.predict(x)

Note that this demo assumes that the missing classes are the last classes (as it pads at the end of the probability vectors). In theory, you could instead determine which classes are present in y_true that are missing from labels (as y_true will be passed from the cross-validation scoring) and pad in those locations instead if the sparsely-populated classes are not the last classes in the dataset, though I have not tested this.

Let us know if you have any thoughts or questions!

PGijsbers commented 3 years ago

Thank you very much for the elaborate response. I was aware of the underlying issue, but I wasn't aware it was a design decision not to address it within TPOT. I understand the decision, feel free to close the issue if desired.

rachitk commented 3 years ago

Thank you very much for the elaborate response. I was aware of the underlying issue, but I wasn't aware it was a design decision not to address it within TPOT. I understand the decision, feel free to close the issue if desired.

Admittedly, I'm not sure if we should or shouldn't handle this case as opposed to relying on the user to know the drawbacks of imbalanced data and/or the limits of the metrics they choose. My logic is that processing the data in any way that isn't fully transparent to the user and/or consistent across all cases will be problematic and that it's better to leave it up to the user as to how they want to handle the situation (since there are many options and the best one will likely depend on what the user knows about their data and the importance of the outlier class - for example, in biomedical data, imbalances are common but usually highly important, like in cases where you have extraordinarily rare diseases with few cases against a large number of "control" cases).

That being said, I'll have to talk with the rest of the lab that supports TPOT to see what the best choice might be. Thank you for raising the issue! We'll keep it open for now while we think about the best way to handle this - we may need to be clearer about this in the documentation or keep it in mind for future TPOT extensions/modificiations.

PGijsbers commented 3 years ago

Yes I think it depends entirely on how hands-off you want the automl experience to be and what the expected data science experience of the user is.

EpistasisLab / tpot