Performance regression in fit method with evaluation sets

I have observed a significant performance regression in XGBoost version 1.7 when using the fit method with evaluation sets in sklearn estimators. The issue appears to have been introduced by this commit, which defaults to using QuantileDMatrix for both training and evaluation sets.

While the optimization of prediction with QuantileDMatrix has been addressed in https://github.com/dmlc/xgboost/issues/9013, there remains a significant performance gap when using QuantileDMatrix for evaluation sets compared to DMatrix.

Here is a sample code to reproduce the issue:

import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import time

n_samples = 1000000
n_features = 100
seed = 42

np.random.seed(seed)

X = np.random.rand(n_samples, n_features)
y = np.random.randint(0, 2, size=n_samples)

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=seed)
X_eval1, X_eval2, y_eval1, y_eval2 = train_test_split(X_temp, y_temp, test_size=0.5, random_state=seed)

model = XGBClassifier(
    tree_method='hist',
    max_depth=6,
    n_estimators=500,
    eval_metric='logloss',
    random_state=seed
)

start_time = time.time()

model.fit(X_train, y_train, eval_set=[(X_eval1, y_eval1), (X_eval2, y_eval2)], verbose=True)

end_time = time.time()
execution_time = end_time - start_time

y_pred_eval1 = model.predict(X_eval1)
y_pred_eval2 = model.predict(X_eval2)

accuracy_eval1 = accuracy_score(y_eval1, y_pred_eval1)
accuracy_eval2 = accuracy_score(y_eval2, y_pred_eval2)

print(f"Accuracy on Evaluation Set 1: {accuracy_eval1:.4f}")
print(f"Accuracy on Evaluation Set 2: {accuracy_eval2:.4f}")

print(f"Execution Time: {execution_time:.2f} seconds")

Performance comparison (with current master branch):

With QuantileDMatrix: 66.13 seconds
With DMatrix: 36.35 seconds

Here are profiling graphs for the two cases:

The graphs clearly show that the performance degradation is linked to the prediction step with QuantileDMatrix for evaluation sets.

This sample code uses synthetic data, but I have observed the same order of magnitude of performance degradation with a real-world dataset.

If no further optimization is possible, I would suggest to change the default behavior to use a simple DMatrix for the evaluation sets.

dmlc / xgboost

Performance regression in fit method with evaluation sets #10793