Open DoDzilla-ai opened 4 years ago
scipy.sparse
matrices don't support the ndarray interface, so many dask.array methods don't work with them . A simpler example
In [10]: X = scipy.sparse.eye(10, format='csr')
In [11]: dX = da.from_array(X)
In [12]: dX.sum()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-fa1407bd43c5> in <module>
----> 1 dX.sum()
~/sandbox/dask/dask/array/core.py in sum(self, axis, dtype, keepdims, split_every, out)
1960 keepdims=keepdims,
1961 split_every=split_every,
-> 1962 out=out,
1963 )
1964
~/sandbox/dask/dask/array/reductions.py in sum(a, axis, dtype, keepdims, split_every, out)
338 dtype=dtype,
339 split_every=split_every,
--> 340 out=out,
341 )
342 return result
~/sandbox/dask/dask/array/reductions.py in reduction(x, chunk, aggregate, axis, keepdims, dtype, split_every, combine, name, out, concatenate, output_size, meta)
155 # The dtype of `tmp` doesn't actually matter, and may be incorrect.
156 tmp = blockwise(
--> 157 chunk, inds, x, inds, axis=axis, keepdims=True, dtype=dtype or float
158 )
159 tmp._chunks = tuple(
~/sandbox/dask/dask/array/blockwise.py in blockwise(func, out_ind, name, token, dtype, adjust_chunks, new_axes, align_arrays, concatenate, meta, *args, **kwargs)
231 from .utils import compute_meta
232
--> 233 meta = compute_meta(func, dtype, *args[::2], **kwargs)
234 if meta is not None:
235 return Array(graph, out, chunks, meta=meta)
~/sandbox/dask/dask/array/utils.py in compute_meta(func, _dtype, *args, **kwargs)
125 if has_keyword(func, "computing_meta"):
126 kwargs_meta["computing_meta"] = True
--> 127 meta = func(*args_meta, **kwargs_meta)
128 except TypeError as e:
129 if (
<__array_function__ internals> in sum(*args, **kwargs)
~/Envs/dask-dev/lib/python3.7/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
2227
2228 return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
-> 2229 initial=initial, where=where)
2230
2231
~/Envs/dask-dev/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
84 # support a dtype.
85 if dtype is not None:
---> 86 return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
87 else:
88 return reduction(axis=axis, out=out, **passkwargs)
TypeError: sum() got an unexpected keyword argument 'keepdims'
You might be able to use pydata/sparse library. I don't think there's anything for dask-ml to do here.
Well, at least the error message can be a little bit informative. There is this error already implemented if the input data is scipy.sparse.csr_matrix: TypeError: Cannot fit PCA on sparse 'X'
. I've spent 1-2 hours trying to figure out the TypeError: sum() got an unexpected keyword argument 'keepdims'
PS: I am a newbie. Sorry...
Well, at least the error message can be a little bit informative
Perhaps, though I don't recall if we can always distinguish between a Dask Array backed by scipy.sparse matricies and a Dask Array backed by a sparse ndarray. Is this something you're interested in investigating further? The Array._meta
attribute may have the information we need.
My data is actually a
scipy.sparse.csr_matrix
. In order to convert this to adask.array
, I am sending the data withclient.scatter
and then I am usingdask.array.from_delayed
. Lastly, I am usingfit
, which is returning this error:TypeError: sum() got an unexpected keyword argument 'keepdims'
. Below you can find the information about the variables used in the code, the code itself and the full-traceback. I will try to add a minimal-working example.Variable Information:
X before conversion to dask array:
X_distributed:
X after conversion to dask array:
Code
Full-traceback