dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 111 forks source link

Classification: Support multiple metrics #163

Closed Roh-codeur closed 1 year ago

Roh-codeur commented 1 year ago

hi,

thanks for the awesome work on XGBoost. I have been using it as below:

    boost = xgboost(
        dtrain,
        numberOfRounds,
        eta = learningRate,
        metrics = ["aucpr", "logloss"],
        obj = CustomObjFunction,
        tree_method=hist
    )

I am not trying to set multiple metrics as below, but, it seems to fail

bst = xgboost((X, Y); XGBoost.classification(objective="binary:logistic", eval_metric= ["auc", "logloss"])...)

Can you please help with this

thanks Rohit

ExpandingMan commented 1 year ago

It was initially not clear to me how this used to work since the underlying C API only provides a method for setting a param to a single string value and eval_metric doesn't seem to support string separators. After some digging, I've discovered that it seems to expect you to set certain params, e.g. eval_metric via multiple separate calls. You can already do this via setparam!. In this PR I have added the ability to set multiple params via an AbstractVector or Tuple, so your example should work after that PR.

Roh-codeur commented 1 year ago

@ExpandingMan : you are awesome mate! I will try this out when I am home. Quick question: it looks like it will only work with Booster object. Does booster accept data frames? I particularly like how you support data frames in this api now.

Thanks!

ExpandingMan commented 1 year ago

Yes, DataFrame are supported implicitly via the Tables.jl interface. Anything you see in the documentation that refers to a "table" also applies to a DataFrame.

Roh-codeur commented 1 year ago

Awesome! Thanks a lot! The v2.0 is great and thanks for the documentation too!

Roh-codeur commented 1 year ago

@ExpandingMan

hey mate, I am afraid I couldn't get it to work :(

        bst = xgboost((view(df, :, [:col1, :col2]), view(trainView, :, col3)); num_round=10, XGBoost.classification(objective="binary:logistic", eval_metric= "auc")...)
        XGBoost.setparam!(bst, "eval_metric", "aucpr")
        XGBoost.setparam!(bst, "eval_metric", "logloss")

Output:

│         eval_metric               auc                                                │
│                                                                                      │
│          objective          binary:logistic                                          │
  1. looks like setparam! overrides instead of appending to the eval_metric
  2. I cannot seem to get Booster to take DataFrames. from the doc below, it seems it can only take Matrices

https://dmlc.github.io/XGBoost.jl/dev/api/#XGBoost.Booster

thanks a lot for your help with this!

ExpandingMan commented 1 year ago

looks like setparam! overrides instead of appending to the eval_metric

It's definitely not going to properly display both values until the PR is merged. Can you check if it's working? I was able to get it to work this way. When that PR is merged the display should be fixed.

I cannot seem to get Booster to take DataFrames. from the doc below, it seems it can only take Matrices

It takes something called a DMatrix which is just an internal xgboost wrapper for the data, but a DataFrame should automatically get wrapped... for example

using XGBoost, DataFrames

df = DataFrame(a=randn(10), b=randn(10), y=randn(10))

Booster((df[:, [:a, :b]], df.y))
# or, alternatively
Booster(DMatrix(df, :y))

There are lots of different ways to pass the data (arguably too many), again, see the docs.

Roh-codeur commented 1 year ago

@ExpandingMan: thanks mate, you are a Star! I tried the below and can confirm that it includes the below metrics. At the risk of sounding pedantic, the XGBoost output doesn't have all metrics, just the last one.

thanks a lot, I appreciate your help and such a quick turnaround with this!

df = DataFrame(a=randn(10), b=randn(10), y=randn(10))
bst = xgboost((df[!, [:a, :b]], df.y),eval_metric=["rmse", "mae", "mape"])

Output:

[ Info: XGBoost: starting training.
[ Info: [1]     train-rmse:1.05386554469353189  train-mae:0.85910702943801875   train-mape:3.26047090739011747
[ Info: [2]     train-rmse:0.84753207030907596  train-mae:0.69181514382362364   train-mape:2.73415630133822551

It only shows "mape", I would have thought it would show all the metrics, i.e "rmse", "mae", "mape"

╭──── XGBoost.Booster ─────────────────────────────────────────────────────────────────╮
│  Features: ["a", "b"]                                                                │
│                                                                                      │
│          Parameter          Value                                                    │
│   ─────────────────────────────────                                                  │
│         eval_metric         mape                                                     │
│   ─────────────────────────────────                                                  │
ExpandingMan commented 1 year ago

At the risk of sounding pedantic, the XGBoost output doesn't have all metrics, just the last one.

This does indeed need to be fixed. Those are displayed with a "hacked together" dict because the parameters are not accessible via the xgboost API, so we have to store them separately.

Roh-codeur commented 1 year ago

This does indeed need to be fixed. Those are displayed with a "hacked together" dict because the parameters are not accessible via the xgboost API, so we have to store them separately.

Got it, thanks mate! this library is awesome, thanks again for all your work on this!