dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
25.89k stars 8.69k forks source link

Possible bug when supplying a list to eval_metric #5340

Closed jborchma closed 4 years ago

jborchma commented 4 years ago

Hey XGBoost team,

I upgraded recently to the latest version 1.0.1 and stumbled upon a possible bug with a piece of code that was running fine with version 0.9. It seems that when one supplies a list of evaluation metrics to eval_metrics, the fit-method fails.

A minimal example would be

import pandas as pd
from sklearn import datasets
import xgboost as xgb

n_features = 13
random_state=75
n_samples = 10000
n_classes = 2
n_informative = 3
X, y = datasets.make_classification(
    n_features=n_features, 
    n_redundant=3, 
    n_repeated=0, 
    n_informative=n_informative, 
    flip_y=0, 
    n_clusters_per_class=2,
    n_classes=n_classes,
    n_samples=n_samples,
    random_state=random_state
)
train_data = pd.DataFrame(X)
train_data["target"] = y
train_data.columns = train_data.columns.astype(str)

y = train_data["target"]
X = train_data[[column for column in train_data.columns if column != "target"]]
params = {"eval_metric": ["auc", "logloss"]}
model = xgb.XGBClassifier(**params)

model.fit(X, y)

The error trace reads:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-4b7d131df230> in <module>
      4 model = xgb.XGBClassifier(**params)
      5 
----> 6 model.fit(X, y)

~/miniconda3/envs/r2train/lib/python3.7/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, callbacks)
    821                               evals_result=evals_result, obj=obj, feval=feval,
    822                               verbose_eval=verbose, xgb_model=xgb_model,
--> 823                               callbacks=callbacks)
    824 
    825         self.objective = xgb_options["objective"]

~/miniconda3/envs/r2train/lib/python3.7/site-packages/xgboost/training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks)
    207                            evals=evals,
    208                            obj=obj, feval=feval,
--> 209                            xgb_model=xgb_model, callbacks=callbacks)
    210 
    211 

~/miniconda3/envs/r2train/lib/python3.7/site-packages/xgboost/training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
     40 
     41     if 'num_parallel_tree' in _params and params[
---> 42             'num_parallel_tree'] is not None:
     43         num_parallel_tree = _params['num_parallel_tree']
     44         nboost //= num_parallel_tree

TypeError: list indices must be integers or slices, not str

Also, just to confirm, this works perfectly fine:

y = train_data["target"]
X = train_data[[column for column in train_data.columns if column != "target"]]
params = {"eval_metric": "auc"}
model = xgb.XGBClassifier(**params)

model.fit(X, y)

Is this an intended change? Because the docs still imply that one could supply a list of evaluation metrics. I checked this both on OS X as well as an Linux server, so it seems to be independent of the system it runs on.

trivialfis commented 4 years ago

Oh no ... It's a bug in training.

hcho3 commented 4 years ago

This is an interesting bug that only surfaces when eval_metric is a list. See #5341 for the fix.

jborchma commented 4 years ago

Wow, fast turnaround! Thanks for looking into this!

hcho3 commented 4 years ago

@jborchma The issue is fixed in 1.0.2 release.

jborchma commented 4 years ago

Thanks so much for the help! Looking forward to using 1.0.2!