EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.57k stars 1.55k forks source link

Title: ValueError: Invalid classes inferred from unique values of y in TPOT with XGBoost #1351

Open kirane61 opened 3 weeks ago

kirane61 commented 3 weeks ago

Context of the issue

I'm encountering an issue while using TPOTClassifier with XGBoost where I receive a ValueError indicating invalid classes inferred from the unique values of y. The expected classes are [0 1 2 3 4 5 6 7 8 9], but the classes I have are [0 1 2 3 4 5 6 7 9 10]. Despite using stratified K-fold splits for cross-validation, one of the classes is missing.

I am using Tpot version 0.12.1. The dataset has 10 classes. Since Xgboost required the labels to be encoded, I have label them from 0 to 9. When I input this data into the Tpot, I am getting the following error: This error appears only when there is slightly large data. For the dataset with a smaller number of rows (~2k), it is working fine. 

Is there any possible workaround to overcome this issue?

perib commented 1 week ago

the list of classes that you have is missing the number 8. You said you labeled from 0 to 9, but it looks like there is also a 10 in the list. Are the classes correctly labeled before passing them into TPOT?

If they are, one possibility is that the cross-validation split doesn't have enough of each class to split between the different folds, leading to some folds missing a step that is expected by xgboost?