[Feature Request] Support evals_result

kylejn27 commented 4 years ago

Hello,

Currently the dask-xgboost package train result does not return evals_result.

I'm thinking it can be implemented in a similar way to https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/dask.py#L348

I'd be happy to open a PR with this change myself, but I'd like to get feedback on what your thoughts are on this implementation because I imagine having the existing train method return a dictionary rather than the booster object will cause breaking changes for those who are using this library currently. If this package will be moving to dmlc/xgboost anyways then maybe this is acceptable, otherwise there's probably cleaner way to return evals_resultto the user

TomAugspurger commented 4 years ago

Can you give an example of what evals_result does? I don't understand their docs. It's a parameter that's passed into train and mutated inplace?

kylejn27 commented 4 years ago

Its a history of the evaluations on each iteration. From what I understand, for every iteration of the train step, the resulting evaluation metric is appended to this evals_result dictionary.

evals_result is added to a record_evaluation callback here https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/training.py#L207

Here's the callback code in dmlc/xgboost: https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/callback.py#L60

Here's an example of what should work with dask-xgboost but isn't currently implemented

import pandas as pd
from sklearn. model_selection import train_test_split
from sklearn.datasets import make_classification
import dask.dataframe as dd
from dask.distributed import Client
import dask_xgboost as dxgb

# Make Client
client = Client()

# Data setup
data = make_classification(
    n_samples=1000,
    n_features=20
)
X = pd.DataFrame(data[0])
X.columns = [f'var{i}' for i in range(20)]
y = pd.DataFrame(data[1])
y.columns = ['target']
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Convert train set to dask dataframes
X_train = dd.from_pandas(X_train, npartitions=1)
y_train = dd.from_pandas(y_train, npartitions=1)

# Model train
model = dxgb.XGBClassifier()
eval_set = [[X_test, y_test]]
eval_metric="logloss"
model.fit(
  X_train,
  y_train,
  classes=[0, 1],
  early_stopping_rounds=4,
  eval_set=eval_set,
  eval_metric=eval_metric
)

>>> print(model.evals_result())
{'validation_0': {'logloss': [0.636035, 0.588901, 0.550328, 0.520794, 0.490704, 0.466473, 0.444285, 0.424012, 0.407814, 0.392962, 0.382754, 0.374211, 0.36438, 0.35752, 0.353132, 0.349372, 0.343062, 0.338768, 0.336939, 0.334325, 0.330798, 0.330391, 0.329221, 0.329147, 0.326469, 0.325981, 0.325691, 0.32589, 0.326658, 0.326615]}}

TomAugspurger commented 4 years ago

Thanks for the clear example.

My main questions now are: where are the evals evaluated, and does the order matter? If they're evaluated on the workers as part of distributed training, then I don't think we can make any guarantee about the order of these results as they come in (I could be misunderstanding what happens though).

kylejn27 commented 4 years ago

I'm still learning how xgboost (and distributed xgboost) works, so I could be incorrect but I'll try to explain this to the best of my ability

where are the evals evaluated

The evaluations are triggered on each worker as part of a post bst.update call to the bst.eval_set method. This is done in the boost round loop.

does the order matter? If they're evaluated on the workers as part of distributed training, then I don't think we can make any guarantee about the order of these results as they come in

I believe order matters, each value in the eval list represents one of the iterations of the train portion of the algorithm. You'd want to see how the model progressed over time on each iteration, jumbling that up would make the result unusable.

I don't fully understand the underlying distributed xgboost algorithm but if the model is updated on each worker after each train iteration so that between rounds the model is identical then the evals results should be identical across all of the workers. I can't point to a spot in the code that proves this but In my testing of this the results have been deterministic and in the right order.

dask / dask-xgboost

[Feature Request] Support evals_result #59