dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.22k stars 8.72k forks source link

objectives and metrics for multi-class multi-output (aka multi-label) xgboost classifier #8289

Open adkinsty opened 2 years ago

adkinsty commented 2 years ago

This tutorial states that XGBoost 1.6 has experimental support for multi-output classification. However, it also states that there is limited support from objectives and metrics. In the multi-class single-output scenario, I use multi:softprob as the objective and mlogloss as a metric. Could I use the same objective and metric for the multi-class multi-output scenario? Or are there alternative objectives and metrics that I should use instead?

trivialfis commented 2 years ago

Unfortunately, multi-class multi-output is not supported.

adkinsty commented 2 years ago

Thanks for your reply. But ah, that's too bad. Do you know of a viable workaround for this? For example, could I pass the XGBClassifier to sklearn's MultiOutputClassifier(), like so:

from sklearn.multiclass import MultiOutputClassifier
from xgboost import XGBClassifier
from sklearn.preprocessing import MultiLabelBinarizer

clf = MultiOutputClassifier(XGBClassifier())
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(y)

clf.fit(X, y)

Forgive me if this is outside of your scope of support.

trivialfis commented 2 years ago

Feel free to try it. :-). We haven't have any test with that meta estimator, you are more than welcomed to add one.