JuliaAI / MLJXGBoostInterface.jl

MIT License
11 stars 5 forks source link

XGBoostClassifier is broken for binary classifiation #7

Closed pgagarinov closed 3 years ago

pgagarinov commented 3 years ago

XGBoost doesn't work correctly when y is specified as a categorical array with two values:

>y=
5273-element CategoricalArray{Int64,1,UInt32}:
 0
 0
 1
 0
 0
 0
 ⋮

In such cases the following code breaks

Booster = @load XGBoostClassifier
booster = Booster(max_depth=5)
pipe = @pipeline ContinuousEncoder booster
mach = machine(pipe, X, y)
fit!(mach)

with the following message:

TaskFailedException

    nested task error: Call to XGBoost C function XGBoosterEvalOneIter failed: [20:35:04] /workspace/srcdir/xgboost/src/metric/multiclass_metric.cu:32: Check failed: label_error >= 0 && label_error < static_cast<int32_t>(n_class): MultiClassEvaluation: label must be in [0, num_class), num_class=1 but found 1 in label
    Stack trace:
      [bt] (0) /home/peter/.julia/artifacts/51760a35c00d03a1b2ee373a49af64d08700a64a/lib/libxgboost.so(xgboost::metric::EvalMClassBase<xgboost::metric::EvalMultiLogLoss>::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0x16e8) [0x7fc24354d5d8]

I suspect that this problem is in these line that do not work correctly for newer versions of XGBoost:

    # An idiosynchrony of xgboost is that num_class=1 for binary case.
    if(num_class==2)
        objective="binary:logistic"
        y_plain = convert(Array{Bool}, y_plain)
        num_class = 1
    else
        objective="multi:softprob"
    end
ablaom commented 3 years ago

Sorry this dropped off my radar. Maintaining over a dozen packages at the moment.

@pgagarinov Any chance you could make a PR for this. Your probably more familiar with the current XGBoost interface than I am. 🙏

ablaom commented 3 years ago

Thanks by the way for reporting and for your diagnosis, which is already helpful.