Closed jaideep11061982 closed 2 years ago
XGBoost requires that the training set contain examples from every class label. If you are using K-fold cross-validation, you should use stratified sampling. See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
XGBoost requires that the training set contain examples from every class label. If you are using K-fold cross-validation, you should use stratified sampling. See https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
@hcho3 this is hard requirement unless it has any performance gain . As i mentioned it cannot be always a case that train and valid set will have atleast 1 instance of each class
This is a limitation of the current algorithm and we have no intention to change this. The reason is that we fit separate trees for every class, and it's not possible to fit a tree on an empty set.
As I mentioned in my earlier comment, there are ways to create train and validation sets so that every class is represented in every set.
We get an error message on xgb.fit Invalid classes inferred from unique values of y. Expected: [0 1 2 3 4 5], got [1 2 3 4 5 6] self.classes_ = np.unique(np.asarray(y)) self.nclasses = len(self.classes_) expected_classes = np.arange(self.nclasses)
I think we dont need this step,it is not necessary that Train data can have all the classes all the time due to any data availability issue for that class or data present in the valid test set