What happened:
Using RandomizedSearchCV (either from dask-ml or from sklearn with dask's backend) with xgboost (1.2.0 version) the script crushes in most of the runs (sometimes, rather rarely, it ends with success, with the same code, data etc. which make the issue harder to diagnose). Lack of info make debugging hard - kernel died and sometimes Windows error “Instruction at Referenced Memory Could Not Be Read”.
Runs:
Sklearn RandomSearchCV + xboost - successful
Sklearn RandomSearchCV with dask backend + xboost - crush (sometimes successful)
Dask RandomSearchCV with dask backend + xboost - crush (sometimes successful)
Dask RandomSearchCV with dask-xgboost - crush
dask-xgboost - numpy.array do not have "to_delayed" method but either dask DataFrames or dask Arrays were given
What you expected to happen:
To use RandomizedSearchCV from dask with xgboost.
Minimal Complete Verifiable Example:
I've attached the poc jupyter notebook I was using during tests. In the folder, I've placed also some screenshots.
https://www.webcargo.net/l/17cuoKPByt/
Anything else we need to know?:
I wasn't able to use xgboost==0.90 version because RandomizedSearchCV error "XGBoostError: need to call fit or load_model beforehand"
Environment:
Dask version: 2.21.0
Python version: 3.8.3
Operating System: Windows 10 Enterprise ver. 1809 compilation 17763.1339
Install method (conda, pip, source): all runs in a separate conda env, packages installed with pip - requirements.txt attached - https://www.webcargo.net/l/17cuoKPByt/
What happened: Using RandomizedSearchCV (either from dask-ml or from sklearn with dask's backend) with xgboost (1.2.0 version) the script crushes in most of the runs (sometimes, rather rarely, it ends with success, with the same code, data etc. which make the issue harder to diagnose). Lack of info make debugging hard - kernel died and sometimes Windows error “Instruction at Referenced Memory Could Not Be Read”.
Runs: Sklearn RandomSearchCV + xboost - successful Sklearn RandomSearchCV with dask backend + xboost - crush (sometimes successful) Dask RandomSearchCV with dask backend + xboost - crush (sometimes successful) Dask RandomSearchCV with dask-xgboost - crush dask-xgboost - numpy.array do not have "to_delayed" method but either dask DataFrames or dask Arrays were given
What you expected to happen: To use RandomizedSearchCV from dask with xgboost. Minimal Complete Verifiable Example:
I've attached the poc jupyter notebook I was using during tests. In the folder, I've placed also some screenshots. https://www.webcargo.net/l/17cuoKPByt/
Anything else we need to know?: I wasn't able to use xgboost==0.90 version because RandomizedSearchCV error "XGBoostError: need to call fit or load_model beforehand" Environment: