Adds local_evals dictionary to each workers xgb.train method. When all workers return and are aggregated, the evals_result dict that is passed into the dask-xgboost.train method is updated with the resulting evaluation history.
Why
It is desirable to recall the evaluation at each iteration of the training process after training the model. This is a feature that exists in dmlc/xgboost that would be nice to have in dask-xgboost
Test
import dask
import dask.array as da
import numpy as np
import pandas as pd
from dask.distributed import Client, LocalCluster
from sklearn.datasets import load_digits, load_iris
from sklearn.model_selection import train_test_split
import dask_xgboost as dxgb
import xgboost as xgb
df = pd.DataFrame(
{"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]}
)
labels = pd.Series([1, 0, 1, 0, 1, 0, 1, 1, 1, 1])
X = df.values
y = labels.values
X2 = da.from_array(X, 5)
y2 = da.from_array(y, 5)
cluster = LocalCluster()
c = Client(cluster)
a = dxgb.XGBRegressor(eval_metric="rmse", random_state=1, seed=1, verbosity=0)
a.fit(X2, y2, eval_set=[(X, y)])
b = xgb.XGBRegressor(eval_metric="rmse", random_state=1, seed=1, verbosity=0)
b.fit(X, y, eval_set=[(X, y)])
c = xgb.dask.DaskXGBRegressor(eval_metric='rmse', random_state=1, seed=1, verbosity=0)
c.fit(X2, y2, eval_set=[(X2, y2)])
assert a.evals_result() == b.evals_result()
assert a.evals_result() == c.evals_result()
assert b.evals_result() == c.evals_result()
What
Closes https://github.com/dask/dask-xgboost/issues/59
Adds local_evals dictionary to each workers
xgb.train
method. When all workers return and are aggregated, the evals_result dict that is passed into thedask-xgboost.train
method is updated with the resulting evaluation history.Why
It is desirable to recall the evaluation at each iteration of the training process after training the model. This is a feature that exists in dmlc/xgboost that would be nice to have in dask-xgboost
Test