Open sarahyurick opened 1 year ago
Should also look into whether model_class
failures on the GPU with xgboost.XGBClassifier
and xgboost.dask.DaskXGBClassifier
are related to this issue.
Update: Opened https://github.com/dask-contrib/dask-sql/issues/1020
After some investigation, it seems like the issue runs pretty deep. Assuming that we can make the necessary changes on the scikit-learn side, quite a few errors still pop up on the Dask and cuML sides as well.
In #886, we removed all dependencies on Dask-ML in favor of scikit-learn, cuML, and our own classes (ParallelPostFit and Incremental). Previously, when creating an experiment,
experiment_class
was expected to be a path to adask_ml
class, butsklearn
classes were also found to be compatible. However, I couldn't get it to work withcuml
, such as withcuml.model_selection.GridSearchCV
. For example:errors with:
Using
model_class = 'xgboost.XGBClassifier'
ormodel_class = 'xgboost.dask.XGBClassifier'
results in the same error as above.When I try it with a
model_class
from cuML, more errors arise. For example, if I try it withmodel_class = 'cuml.dask.ensemble.RandomForestClassifier'
(cuML has noGradientBoostingClassifier
), scikit-learn raises aI tried a couple of different changes on the Dask-SQL side but have yet to find a solution. It's possible that this will require changes on the Dask and/or cuML side of things.