With xgboost 1.2 and current master on dask-ml, passing in DaskXGBClassifier() to RandomizedSearchCV fails due to fit_and_score receiving numpy arrays instead of dask arrays.
I'm still trying to track down where the errant compute is happening (very possible it's happening on the xgboost side, just wanted to raise this here for awareness)
before fit_and_score is called, the graph is updated with the keys for X and y, but when the graph is executed numpy arrays are retrieved instead. (Could be an errant client.get on the xgboost side)
What happened:
In the notebook, a bunch of "key has failed" messages -- in the terminal this error repeatedly:
What you expected to happen:
I expect to get back a trained booster.
Minimal Complete Verifiable Example:
from distributed.client import Client
from dask_ml.model_selection import RandomizedSearchCV
from dask_ml.datasets import make_classification
from xgboost.dask import DaskXGBClassifier
import xgboost
import dask_ml
print(f"{xgboost.__version__=}")
print(f"{dask_ml.__version__=}")
c = Client()
param_distibutions = {
"max_depth": [5],
"min_child_weight": [10],
"learning_rate": [0.05],
}
X, y = make_classification(n_samples=1000, n_features=20, chunks=(100, 20))
estimator = DaskXGBClassifier()
estimator
clf = RandomizedSearchCV(estimator, param_distibutions)
clf.fit(X, y)
xgboost.__version__='1.2.0'
dask_ml.__version__='1.7.1.dev3+gc55c1898.d20201119'
/Users/vjs275/miniforge3/envs/msrm2/lib/python3.8/site-packages/sklearn/model_selection/_search.py:278: UserWarning: The total space of parameters 1 is smaller than n_iter=10. Running 1 iterations. For exhaustive searches, use GridSearchCV.
warnings.warn(
('daskxgbclassifier-fit-score-61fb6731d022ad077e150ddca561ff0e', 0, 0) has failed... retrying
('daskxgbclassifier-fit-score-61fb6731d022ad077e150ddca561ff0e', 0, 0) has failed... retrying
('daskxgbclassifier-fit-score-61fb6731d022ad077e150ddca561ff0e', 0, 0) has failed... retrying
('daskxgbclassifier-fit-score-61fb6731d022ad077e150ddca561ff0e', 0, 0) has failed... retrying
('daskxgbclassifier-fit-score-61fb6731d022ad077e150ddca561ff0e', 0, 2) has failed... retrying
('daskxgbclassifier-fit-score-61fb6731d022ad077e150ddca561ff0e', 0, 0) has failed... retrying
Anything else we need to know?:
I ran into this with xgboost 1.3 snapshots, but ran this MCVE with xgboost 1.2 to confirm I didn't do anything weird with the snapshot install.
I think I got caught in an xy-problem right here. sklearn.model_selection.GridSearchCV works here which, I think, resolves this particular issue, but I'll run some more tests.
With xgboost 1.2 and current master on dask-ml, passing in
DaskXGBClassifier()
toRandomizedSearchCV
fails due tofit_and_score
receiving numpy arrays instead of dask arrays.I'm still trying to track down where the errant compute is happening (very possible it's happening on the
xgboost
side, just wanted to raise this here for awareness)before
fit_and_score
is called, the graph is updated with the keys forX
andy
, but when the graph is executed numpy arrays are retrieved instead. (Could be an errantclient.get
on the xgboost side)What happened: In the notebook, a bunch of "key has failed" messages -- in the terminal this error repeatedly:
What you expected to happen: I expect to get back a trained booster.
Minimal Complete Verifiable Example:
Anything else we need to know?: I ran into this with xgboost 1.3 snapshots, but ran this MCVE with xgboost 1.2 to confirm I didn't do anything weird with the snapshot install.
Environment:
conda