dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 111 forks source link

Quick question on Custom objective #178

Closed Roh-codeur closed 1 year ago

Roh-codeur commented 1 year ago

hi

I have the below code to predict binary outcomes.

boost = xgboost((X, Y); 
                    num_round=numberOfRounds, 
                    eta = learningRate,
                    tree_method=treeMethod,
                    XGBoost.classification(objective="binary:logistic", eval_metric= ["auc", "aucpr", "logloss"])...)

I tried running

boost = xgboost((X, Y), WeightedLossGradient, WeightedLossHessian; 
                    num_round=numberOfRounds, 
                    eta = learningRate,
                    tree_method=treeMethod,
                    XGBoost.classification(objective="binary:logistic", eval_metric= ["auc", "aucpr", "logloss"])...)

Got:

value 0 for Parameter num_class should be greater equal to 1
num_class: Number of output class in the multi-class classification.

Tried: xgboost((X, Y), WeightedLossGradient, WeightedLossHessian; num_class=2, XGBoost.classification(eval_metric= ["auc", "aucpr", "logloss"])...) Got:

ERROR: XGBoostError: (caller: XGBoosterBoostOneIter)
[22:48:42] /workspace/srcdir/xgboost/src/gbm/gbtree.cc:372: Check failed: gpair->Size() == p_fmat->Info().num_row_ (340 vs. 680) : Mismatching size between number of rows from input data and size of gradient vector.

How do I set custom objective in this call please?

thanks Roh

Roh-codeur commented 1 year ago

the closest I could get this to work was as below:

boost = xgboost((X, Y), WeightedLossGradient, WeightedLossHessian; 
                    num_round=numberOfRounds, 
                    eta = learningRate,
                    tree_method=treeMethod,
                    XGBoost.classification(objective="binary:logistic", eval_metric= ["auc", "aucpr", "logloss"])...)
ExpandingMan commented 1 year ago

That error is telling you that you need to provide num_class=n (I think in your case n == 2) as an argument to xgboost. We probably should make XGBoost.classification handle that more elegantly.

Roh-codeur commented 1 year ago

thanks @ExpandingMan : I got it to work as below:

df = DataFrame(a=rand(Int, 10), b=randn(10), y=BitArray(rand(Bool, 10)))

boost = xgboost((df[!, [:a, :b]], df.y), Grad, Hess; 
                    num_round=100, 
                    eta = 0.3,
                    tree_method="hist",
                    XGBoost.classification(objective="binary:logistic", eval_metric= ["auc", "aucpr", "logloss"])...)