XGBoost eval_metric and prediction results are different

dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

https://xgboost.readthedocs.io/en/stable/

Apache License 2.0

25.78k stars 8.69k forks source link

XGBoost eval_metric and prediction results are different #8774

Closed aParsecFromFuture closed 1 year ago

aParsecFromFuture commented 1 year ago

def acc(y_true, y_pred):
        return accuracy_score(y_true, y_pred.round())

  params = {
      'objective': 'binary:logistic',
      'enable_categorical': True,
      'tree_method': 'hist',
      'eval_metric': acc
  }

  model = xgb.XGBClassifier(**params)

  model.fit(X_train, y_train, 
            eval_set=[(X_valid, y_valid)])

  y_pred = model.predict(X_valid)
  score = accuracy_score(y_valid, y_pred)
  print(score)

Accuracy score should be same score of the last iteration but I get 73% and 69%. Validation data has 10k rows. Any idea?

trivialfis commented 1 year ago

I can't reproduce using the latest xgboost. Could you please share a reproducible example?

aParsecFromFuture commented 1 year ago

I can't reproduce using the latest xgboost. Could you please share a reproducible example?

It's from a Kaggle competition. I can't make the notebook public. Do you have any Kaggle account that I can share the notebook with?

I have used the xgboost library a lot, I only face such a problem with this dataset.

aParsecFromFuture commented 1 year ago

I get the same behavior with built-in error metric. (1 - error) is not equal to accuracy score.

Class distribution: (0: 32496, 1: 20523)

trivialfis commented 1 year ago

Let me get back to this tomorrow. I have an account with the same email address on my github profile.

aParsecFromFuture commented 1 year ago

Let me get back to this tomorrow. I have an account with the same email address on my github profile.

I shared the notebook with trivialfis user. Good luck.

trivialfis commented 1 year ago

This is a limited-participation competition. Only invited users may participate.

aParsecFromFuture commented 1 year ago

This is a limited-participation competition. Only invited users may participate.

I'm going to open it publicly for today. Let me know when your review is done. Link: https://www.kaggle.com/greysky/temporary-notebook-it-will-be-deleted

trivialfis commented 1 year ago

thank you for sharing, I can see the notebook, it's the dataset that I can't access. I need to participate in the competition before seeing the dataset, but the competition is invite only. Screenshot from 2023-02-15 16-05-22

aParsecFromFuture commented 1 year ago

thank you for sharing, I can see the notebook, it's the dataset that I can't access. I need to participate in the competition before seeing the dataset, but the competition is invite only.

I have created the same notebook with copy-dataset and invited you. Could you try again?

trivialfis commented 1 year ago

Hi, I think it's possible that this is caused by the verbose=20 in the fit method, the latest evaluation result is not displayed as the model print only at each 20 iterations.

You can obtain the list of evaluation results by clf.evals_result().

trivialfis commented 1 year ago

NVM, later iteration is changed, still looking.

aParsecFromFuture commented 1 year ago

Hi, I think it's possible that this is caused by the verbose=20 in the fit method, the latest evaluation result is not displayed as the model print only at each 20 iterations.

You can obtain the list of evaluation results by clf.evals_result().

There is no iteration with output accuracy. I noticed early that even if we do hyper parameter optimization (like Optuna), the output accuracy remains 0.61 as eval_metric output approaches 0.70. I was thinking that this happens because the LogLoss value is constantly increasing. But "disable_default_eval_metric" option didn't prevent that strange behavior.

trivialfis commented 1 year ago

Thank you for raising the issue and for the helpful assistance! I have reproduced the error using the dataset from the competition.