PacktPublishing / Learn-Amazon-SageMaker

Learn Amazon SageMaker
MIT License
103 stars 88 forks source link

wrong metric reported in sdkv2/ch7/xgb/xgb-dm.py #9

Open dipetkov opened 3 years ago

dipetkov commented 3 years ago

Chapter 7 shows how to use the XGBoost framework to train a xgb.XGBClassifier by optimizing AUC.

After training, the script prints out the AUC score on the validation data.

auc = cls.score(x_val, y_val)
print("AUC ", auc)

[Snippet on lines 49-50 in sdkv2/ch7/xgb/xgb-dm.py.]

However, xgb.XGBClassifier.score returns the mean accuracy, not the evaluation metric.

So instead clf.score it's better to use sklearn.metrics.roc_auc_score.

Here is a complete reproducible example:

import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

cancer = load_breast_cancer()

x_trn, x_val, y_trn, y_val = train_test_split(
    cancer.data, cancer.target,
    test_size=0.2,
    random_state=1,
)

model = xgb.XGBClassifier(
    objective="binary:logistic",
    eval_metric="auc",
    max_depth=2,
    random_state=2,
)
model.fit(
    x_trn, y_trn,
    verbose=False,
)

p_val = model.predict_proba(x_val)[:, 1]
roc_auc_score(y_val, p_val)  # 0.9861

# Returns the mean accuracy, not the evaluation metric.
model.score(x_val, y_val)  # 0.9561