[pyspark] Add tracker_on_driver to decide where the tracker will be launched

df_train = spark.createDataFrame(
    [
        (Vectors.dense(1.0, 2.0, 3.0), 0, False, 1.0),
        (Vectors.sparse(3, {1: 1.0, 2: 5.5}), 1, False, 2.0),
        (Vectors.dense(4.0, 5.0, 6.0), 0, True, 1.0),
        (Vectors.sparse(3, {1: 6.0, 2: 7.5}), 1, True, 2.0),
    ]
    * 100,
    ["features", "label", "isVal", "weight"],
)

from xgboost.spark import SparkXGBRegressor

callbacks = EvaluationMonitor()
xgb_regressor = SparkXGBRegressor(
    num_workers=5,
    callbacks=[callbacks],
    tracker_on_driver=True,
    validation_indicator_col="isVal",
)
xgb_reg_model = xgb_regressor.fit(df_train)

With the above test code, The below log will be printed on the driver. Or else, they will be printed on the executor side.

[0] training-rmse:0.35149   validation-rmse:0.35149
[0] training-rmse:0.35149   validation-rmse:0.35149
[1] training-rmse:0.24708   validation-rmse:0.24708
[1] training-rmse:0.24708   validation-rmse:0.24708
[2] training-rmse:0.17369   validation-rmse:0.17369
[2] training-rmse:0.17369   validation-rmse:0.17369
[3] training-rmse:0.12210   validation-rmse:0.12210
[3] training-rmse:0.12210   validation-rmse:0.12210
[4] training-rmse:0.08583   validation-rmse:0.08583
[4] training-rmse:0.08583   validation-rmse:0.08583
[5] training-rmse:0.06034   validation-rmse:0.06034
[5] training-rmse:0.06034   validation-rmse:0.06034
[6] training-rmse:0.04242   validation-rmse:0.04242
...

dmlc / xgboost

[pyspark] Add tracker_on_driver to decide where the tracker will be launched #10281