Closed jrbourbeau closed 2 years ago
Thank you for opening the issue. I will work on some tests for sparse
and scipy.sparse
with dasks.
I'm encountering the same issue as @jrbourbeau with the following package versions: xgboost: 1.5.1 dask: 2022.02.0 distributed: 2022.02.0
The example code snippet above returns the same error: "AttributeError: divisions not found"
@trivialfis -- were your changes merged into 1.5.1?
@trivialfis - any update on this? I am still encountering this issue while running xgboost 1.5.1
@rrpelgrim Please update to the latest XGBoost 1.6.1
I came across a use case where attempting to fit a
DaskXGBClassifier
on a Dask Array whose partitions arescipy.sparse.csr_matrix
s (as is returned by Dask-ML'sHashingVectorizer
) results in aAttributeError: divisions not found
error (full traceback included below).From doing some initial debugging it appears the underlying issue is that during the fitting process we end up passing a
list
of sparse matrices to Dask'sdd.multi.concat
herehttps://github.com/dmlc/xgboost/blob/d33854af1b4f783c5230bb21aff7234b16f409f7/python-package/xgboost/dask.py#L207
However,
dd.multi.concat
expects alist
of Dask DataFrames, which is where theAttributeError: divisions not found
is coming from (Dask DataFrames have a.divisions
attribute whichdd.multi.concat
assumes exists).Here's an example code snippet which should reproduce the issue when using the latest
xgboost
(1.5.0) anddask
(2021.11.2) /distributed
(2021.11.2) releases:Full traceback:
``` Traceback (most recent call last): File "/Users/james/projects/coiled/evangelism-private/mongodb-with-coiled/test.py", line 28, in