dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 110 forks source link

possible issue with predict and objective binary:logistic #146

Closed bobaronoff closed 1 year ago

bobaronoff commented 1 year ago

I am trying to track down some unexpected behavior. I am testing with a prior dataset - 0/1 classification. When I run with xgboost on R, the test data gives an AUC of 0.6. When I run same data, same parameters, in Julia, the AUC comes back 0.3 - this is way worse than random - somethings up and am trying to sort out. Is it possible that changes made to predict with issue #143 could be disordering values when the return has single dimensions? Not sure how this would be, but been at this for a while and not making progress.

ExpandingMan commented 1 year ago

Simple experiments don't reveal any problems for me. When I do the following

    using MLJ, XGBoost

    (X, y) = make_blobs(1000, 3; centers=2, rng=999)
    y = Int.(int.(y)) .- 1
    X = MLJ.matrix(X)

    dm = DMatrix(X, y)

    b = xgboost(dm, num_round=5, objective="binary:logistic")

    ŷraw = XGBoost.predict(b, X)
    ŷ = map(ζ -> round(Int, ζ), ŷraw)

    f = sum(ŷ .== y)/length(y)

I get f == 1.0.

Can you please list all parameters you are passing when you are having a problem?

bobaronoff commented 1 year ago

It seems the problem is on my end. I ran the model with same train/test of dataset in both R(xgboost) and Julia(XGBoost.jl). The test predictions correlate fairly well. I did ROC/AUC in R and results of both test predictions are similar ( and much different from what I had from Julia). So the issue appears to be how I am creating ROC curve (MLbase.jl) and not XGBoost.jl. I will sort out ROC issue. Thank you !!

I apologize for the run around.