NEP-18: CuPy backend requires unimplemented algorithms

pentschev commented 5 years ago

For the past few weeks, I've been working on issues related to NEP-18 support for Dask (and ultimately, Dask-GLM as well) to allow CuPy to be used as a backend. I've made some progress, which I'll soon share with more details.

Among the issues I have now is the usage of the following Dask-GLM algorithms with CuPy:

newton: requires numpy.linalg.lstsq
admm and lbfgs: require scipy.optimize.fmin_l_bfgs_b

Both of these functions are not implemented in CuPy. So I'm wondering if anybody knows whether any of these functions have some open and reliable CUDA implementation that could be integrated into CuPy or if we could implement them somewhat easily on top of already supported CuPy functionality?

mrocklin commented 5 years ago

cc @cicdw

mrocklin commented 5 years ago

My guess is that in both cases the resulting array (the input of lstsq or the gradient returned by compute_loss_grad) is probably very small, and so one approach would be to just always convert it into a NumPy array. My guess is that from a performance perspective this would have very little impact and would be easy to accomplish.

The function dask.array.utils.normalize_to_array was intended for this purpose (though it currently supports only the cupy case. We might consider renaming it to normalize_to_numpy.

cicdw commented 5 years ago

I'm not familiar with the CuPy / CUDA ecosystem, but in terms of implementing the two methods you mention: I'd suspect that some form of lstsq would be doable, but fmin_l_bfgs_b would probably be incredibly difficult.

mrocklin commented 5 years ago

So my proposal above isn't that we implement lstsq and fmin_l_bfgs_b in CUDA, it's that we just always coerce the necessary arrays to NumPy. My understanding is that the inputs to lstsq, and the outputs of compute_loss_grad are on the order of the number of parameters, which is typically under 100 or so and so is probably better suited to the CPU anyway. Is my understanding correct here @cicdw ?

cicdw commented 5 years ago

Ah I see that now; yea that's all correct.

pentschev commented 5 years ago

The idea to use normalize_to_array() sounds good to me. I'm now wondering about the case where we need to convert these outputs back to their original type arrays. We can use *_like() but only for predefined fillings, do we have any ideas on doing the reverse normalize_to_array() operation? I thought maybe introducing a copy_like() function in NumPy could be useful for that, but perhaps there's already a better alternative that I don't know about.

mrocklin commented 5 years ago

I think that it's probably OK if we always return numpy arrays. These outputs are likely to be small.

pentschev commented 5 years ago

It's not ok in here: https://github.com/dask/dask-glm/blob/master/dask_glm/algorithms.py#L347

We pass beta0, X and y to scipy.optimize.fmin_l_bfgs_b. beta0 gets has to be copied back to CPU because it gets modified here: https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb.py#L326

Later on, x (formerly beta0) is passed here: https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb.py#L335, which is actually wrapping https://github.com/dask/dask-glm/blob/master/dask_glm/algorithms.py#L340. So here, X and y remain on GPU, but beta is on CPU. And this loop happens several times. If we copy X and y back to CPU, we beat the purpose of using CuPy in the first place.

mrocklin commented 5 years ago

Ah, my hope was that when X, y, and beta interacted that the GPU arrays would pull the CPU array to the GPU, rather than the other way around. One sec, I'm going to play with this briefly.

mrocklin commented 5 years ago

So I was hoping that things like the following would work

In [1]: import numpy as np

In [2]: import cupy

In [3]: x = np.arange(100)

In [4]: y = cupy.arange(100)

In [5]: z = x + y

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-8c8f78e0676d> in <module>
----> 1 z = x + y

~/cupy/cupy/core/core.pyx in cupy.core.core.ndarray.__array_ufunc__()
   1734                     for x in inputs
   1735                 ])
-> 1736             return cp_ufunc(*inputs, **kwargs)
   1737         # Don't use for now, interface uncertain
   1738         # elif method =='at' and name == 'add':

~/cupy/cupy/core/_kernel.pyx in cupy.core._kernel.ufunc.__call__()
    790             raise TypeError('Wrong number of arguments for %s' % self.name)
    791
--> 792         args = _preprocess_args(args)
    793         if out is None:
    794             in_args = args[:self.nin]

~/cupy/cupy/core/_kernel.pyx in cupy.core._kernel._preprocess_args()
     84             arg = _scalar.convert_scalar(arg, use_c_scalar)
     85             if arg is None:
---> 86                 raise TypeError('Unsupported type %s' % typ)
     87         ret.append(arg)
     88     return ret

TypeError: Unsupported type <class 'numpy.ndarray'>

mrocklin commented 5 years ago

So, if we had empty_like(..., shape=) maybe that would allow us to do copy_like (without needing to actually add that to the NumPy API. Something like

if type(X) != type(beta):
    beta2 = np.empty_like(X, shape=beta.shape)
    beta2[:] = beta

I wanted to verify that assignment worked between numpy and cupy, but found that it didn't:

In [6]: y[:] = x

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-2b3b50afef58> in <module>
----> 1 y[:] = x

~/cupy/cupy/core/core.pyx in cupy.core.core.ndarray.__setitem__()
   1682
   1683         """
-> 1684         _scatter_op(self, slices, value, 'update')
   1685
   1686     def scatter_add(self, slices, value):

~/cupy/cupy/core/core.pyx in cupy.core.core._scatter_op()
   3608     if op == 'update':
   3609         if not isinstance(value, ndarray):
-> 3610             y.fill(value)
   3611             return
   3612         x = value

~/cupy/cupy/core/core.pyx in cupy.core.core.ndarray.fill()
    520         if isinstance(value, numpy.ndarray):
    521             if value.shape != ():
--> 522                 raise ValueError(
    523                     'non-scalar numpy.ndarray cannot be used for fill')
    524             value = value.item()

ValueError: non-scalar numpy.ndarray cannot be used for fill

This seems like the kind of things that cupy might accept as a PR. I was surprised to learn that this didn't work.

pentschev commented 5 years ago

I actually had the exact same idea with the y[:] = x assignment before, but I assumed the non-automatic conversion was intentional. I think if we can allow that to work, it would be a good solution, without even needing to touch NumPy, which seems better.

I will see if I can write a PR for that, hopefully it won't be too complicated.

mrocklin commented 5 years ago

It may be intentional, I'm not sure. You might raise a cupy issue to ask about it. I just checked their issue tracker and didn't see anything about it yet.

On Fri, Mar 1, 2019 at 9:23 AM Peter Andreas Entschev < notifications@github.com> wrote:

I actually had the exact same idea with the y[:] = x assignment before, but I assumed the non-automatic conversion was intentional. I think if we can allow that to work, it would be a good solution, without even needing to touch NumPy, which seems better.

I will see if I can write a PR for that, hopefully it won't be too complicated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask-glm/issues/73#issuecomment-468742667, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGnH0qMedCd9-hR_AaBvSYX2PVJCks5vSWIJgaJpZM4bXsha .

pentschev commented 5 years ago

I found two open issues related to what we just talked here:

https://github.com/cupy/cupy/issues/593 https://github.com/cupy/cupy/issues/589

dask / dask-glm

NEP-18: CuPy backend requires unimplemented algorithms #73