Open mrocklin opened 6 years ago
Copying the scikit-learn implementation seems fine. I was probably just trying to reduce maintenance, but not realizing the the delayed(svd_flip)
would create a 1-block dask array.
Calling a delayed function on any dask collection will eventually force that collection into a concrete version of itself (a numpy array or pandas dataframe) so that it can call the function on it.
On Thu, Oct 11, 2018 at 4:20 PM Tom Augspurger notifications@github.com wrote:
Copying the scikit-learn implementation seems fine. I was probably just trying to reduce maintenance, but not realizing the the delayed(svd_flip) would create a 1-block dask array.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/401#issuecomment-429104072, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszNsPCEY8gVizHZ-S8kEKHPlJMOU7ks5uj6fxgaJpZM4XX_Wh .
Tried this briefly.
signs = da.sign(u[max_abs_cols, list(range(u.shape[1]))])
because
1241 if any(isinstance(i, Array) and i.dtype.kind in 'iu' for i in index2):
-> 1242 self, index2 = slice_with_int_dask_array(self, index2)
1243 if any(isinstance(i, Array) and i.dtype == bool for i in index2):
1244 self, index2 = slice_with_bool_dask_array(self, index2)
~/sandbox/dask/dask/array/slicing.py in slice_with_int_dask_array(x, index)
890 ]
891 if sum(fancy_indexes) > 1:
--> 892 raise NotImplementedError("Don't yet support nd fancy indexing)")
893
894 out_index = []
NotImplementedError: Don't yet support nd fancy indexing)
Haven't tried looking for alternative ways of doing this.
You can probably replace this with
signs = da.sign(u[max_abs_cols, :u.shape[1]])
Is this called "pointwise indexing?"
ipdb> x[max_abs_cols, [0, 1]]
array([ 0.11476366, -0.09881881])
ipdb> x[max_abs_cols, :2]
array([[ 0.11476366, -0.00279077],
[ 0.03486335, -0.09881881]])
we want the first kind. Which should be equivalent to slicing and then doing diag, I think
ipdb> np.diag(x[max_abs_cols])
array([ 0.11476366, -0.09881881])
You might want x.vindex[...]
Currently we use this function on the outputs of svd
This forces potentially large (I think?) multi-chunked arrays into a single chunk
I wonder if the actual sklearn function might work on its own. It doesn't appear to use any functionality outside of dask.array (assuming that functions like np.sign and np.argmax function as ufuncs)
Or maybe this is unnecessary?