Closed Roh-codeur closed 1 year ago
It was initially not clear to me how this used to work since the underlying C API only provides a method for setting a param to a single string value and eval_metric
doesn't seem to support string separators. After some digging, I've discovered that it seems to expect you to set certain params, e.g. eval_metric
via multiple separate calls. You can already do this via setparam!
. In this PR I have added the ability to set multiple params via an AbstractVector
or Tuple
, so your example should work after that PR.
@ExpandingMan : you are awesome mate! I will try this out when I am home. Quick question: it looks like it will only work with Booster object. Does booster accept data frames? I particularly like how you support data frames in this api now.
Thanks!
Yes, DataFrame
are supported implicitly via the Tables.jl interface. Anything you see in the documentation that refers to a "table" also applies to a DataFrame
.
Awesome! Thanks a lot! The v2.0 is great and thanks for the documentation too!
@ExpandingMan
hey mate, I am afraid I couldn't get it to work :(
bst = xgboost((view(df, :, [:col1, :col2]), view(trainView, :, col3)); num_round=10, XGBoost.classification(objective="binary:logistic", eval_metric= "auc")...)
XGBoost.setparam!(bst, "eval_metric", "aucpr")
XGBoost.setparam!(bst, "eval_metric", "logloss")
Output:
│ eval_metric auc │
│ │
│ objective binary:logistic │
https://dmlc.github.io/XGBoost.jl/dev/api/#XGBoost.Booster
thanks a lot for your help with this!
looks like setparam! overrides instead of appending to the eval_metric
It's definitely not going to properly display both values until the PR is merged. Can you check if it's working? I was able to get it to work this way. When that PR is merged the display should be fixed.
I cannot seem to get Booster to take DataFrames. from the doc below, it seems it can only take Matrices
It takes something called a DMatrix
which is just an internal xgboost wrapper for the data, but a DataFrame
should automatically get wrapped... for example
using XGBoost, DataFrames
df = DataFrame(a=randn(10), b=randn(10), y=randn(10))
Booster((df[:, [:a, :b]], df.y))
# or, alternatively
Booster(DMatrix(df, :y))
There are lots of different ways to pass the data (arguably too many), again, see the docs.
@ExpandingMan: thanks mate, you are a Star! I tried the below and can confirm that it includes the below metrics. At the risk of sounding pedantic, the XGBoost output doesn't have all metrics, just the last one.
thanks a lot, I appreciate your help and such a quick turnaround with this!
df = DataFrame(a=randn(10), b=randn(10), y=randn(10))
bst = xgboost((df[!, [:a, :b]], df.y),eval_metric=["rmse", "mae", "mape"])
Output:
[ Info: XGBoost: starting training.
[ Info: [1] train-rmse:1.05386554469353189 train-mae:0.85910702943801875 train-mape:3.26047090739011747
[ Info: [2] train-rmse:0.84753207030907596 train-mae:0.69181514382362364 train-mape:2.73415630133822551
It only shows "mape", I would have thought it would show all the metrics, i.e "rmse", "mae", "mape"
╭──── XGBoost.Booster ─────────────────────────────────────────────────────────────────╮
│ Features: ["a", "b"] │
│ │
│ Parameter Value │
│ ───────────────────────────────── │
│ eval_metric mape │
│ ───────────────────────────────── │
At the risk of sounding pedantic, the XGBoost output doesn't have all metrics, just the last one.
This does indeed need to be fixed. Those are displayed with a "hacked together" dict because the parameters are not accessible via the xgboost API, so we have to store them separately.
This does indeed need to be fixed. Those are displayed with a "hacked together" dict because the parameters are not accessible via the xgboost API, so we have to store them separately.
Got it, thanks mate! this library is awesome, thanks again for all your work on this!
hi,
thanks for the awesome work on XGBoost. I have been using it as below:
I am not trying to set multiple metrics as below, but, it seems to fail
bst = xgboost((X, Y); XGBoost.classification(objective="binary:logistic", eval_metric= ["auc", "logloss"])...)
Can you please help with this
thanks Rohit