Open hayesgb opened 3 years ago
If you call
X = X.compute_chunk_sizes()
before calling .fit()
, then the search proceeds as expected.
This becomes particularly hard to troubleshoot when you use dask_ml.preprocessing.OneHotEncoder(), either alone, or in a pipeline.
What happened: When running hyperparameter search with sklearn's RandomizedGridSearch and DaskXGBoostClassifier, get the following error:
What you expected to happen: Return a "best_estimator" after completing the gridsearch.
Alternative approaches would be to use dask_ml.model_selection.RandomizedGridSearch(), which results in the issue reported in #758 . It's my understanding from the documentation that the DaskXGBoost object passes the Dask Array to a DMatrix for parallel training, so that parallel search in conjunction with distributed training is a challenge to implement.
Minimal Complete Verifiable Example:
Anything else we need to know?:
Environment: