GPU Support - Githubissues

jakirkham commented 4 years ago

Would be useful to support these operations on GPUs as well. One way to do this might be to dispatch using CuPy.

jakirkham commented 4 years ago

Here's what convolution would look like if someone rolled their own today using CuPy 7.0.0 and Dask 2.9.1. There are a few rough edges, but we could improve on this over time both in dask-image and upstream.

import numpy
import numpy as np

import cupy
import cupy as cp

import cupyx
import cupyx.scipy
import cupyx.scipy.ndimage
import cupyx.scipy.ndimage.filters

import dask
import dask.array
import dask.array as da

import dask_image
import dask_image.ndfilters
import dask_image.ndfilters._utils

s = (100, 110)
a = da.from_array(cp.arange(int(np.prod(s)), dtype=cp.float32).reshape(s), chunks=10)
w = cp.ones(a.ndim * (3,), dtype=cp.float32)

origin = dask_image.ndfilters._utils._get_origin(w.shape, 0)
depth = dask_image.ndfilters._utils._get_depth(w.shape, origin)
depth, boundary = dask_image.ndfilters._utils._get_depth_boundary(a.ndim, depth, "none")

r = a.map_overlap(cupyx.scipy.ndimage.filters.convolve,
                  depth=depth,
                  boundary=boundary,
                  dtype=a.dtype,
                  weights=w,
                  mode="reflect",
                  cval=0.0,
                  origin=origin)
r._meta = cp.empty(r.ndim * (0,), dtype=r.dtype)

r.compute()

GenevieveBuckley commented 4 years ago

Linking to this blog post from Matt about mixing dask and cupy arrays, since I found it helpful to read and it's relevant for anyone who comes across this issue: https://matthewrocklin.com/blog/work/2019/01/03/dask-array-gpus-first-steps

jakirkham commented 4 years ago

Also we may be able to use __array_function__ to dispatch over different operations (like convolve here). Some discussion along these lines in issue ( https://github.com/scipy/scipy/issues/10204 ).

GenevieveBuckley commented 4 years ago

I haven't had a chance to look much into cupyimg (BSD-3 licensed), but that could also be useful to look at here: https://github.com/mritools/cupyimg

jakirkham commented 4 years ago

Yeah I’ve seen that. There’s also some work to add equivalent functions to CuPy.

The long term solution would be to use __array_function__ to dispatch. Maybe a short to medium term solution would be to provide our own dispatching mechanism.

jakirkham commented 4 years ago

I've been thinking about implementing some sort of dispatching solution. Maybe that gives us something useful in the near term to medium term. I'll try to put a POC together when I have a chance so we can take a look.

GenevieveBuckley commented 4 years ago

@jakirkham is it worth us having a quick discussion about the still to be decided dispatch mechanism? (Been trying and failing to get my thoughts coherent enough for text for a couple of months now)

I'm interested in putting in some of my own effort into helping things along. I've been doing a lot of reading around NEP 18, 37, etc. but I'm still pretty unclear about what things should look like for us here. Presumably you're a lot clearer on this in your own mind.

jakirkham commented 4 years ago

Yeah I think that is a great idea! Was thinking about suggesting that as well. Am on US Pacific timezone now so should be easier to come up with a time that works. Should we try to schedule something for next week?

GenevieveBuckley commented 4 years ago

Sounds good @jakirkham I've just sent you an email, so we don't clog up the thread here with conversation about scheduling.

GenevieveBuckley commented 4 years ago

So it seems GPU support for our ndfourier and ndmeasure modules will be significantly more complicated to implement. The extra difficulty here is because these modules don't have the same implementation pattern using map_overlap like we have in ndfilters etc.

One of the main blockers as I understand it is that there isn't currently any support for switching array backends for functions like dask.array.ones(), dask.array.arange() etc. These wrap numpy directly (see here for an example) and get used a lot - both in our code and in dask functions like dask.fft.fftfreq.

Worth noting: there is an open issue at https://github.com/dask/dask/issues/6118 suggesting some problems with dask fourier transforms of cupy arrays might be solved with cupy release 8.0.0 this September. However, this won't fix all our problems because dask-image uses dask.fft.fftfreq which also relies on dask arange, which suffers the problem described in the previous paragraph.

We also use some numpy features that don't have equivalents in cupy - in particular we use the numpy.ndindex iterator for label comprehensions (which all of the ndmeasure module relies on), but there is no cupy.ndindex that exists.

jakirkham commented 4 years ago

Thanks for looking into these 😄

Yeah the general array creation problem is a known issue, which is what Peter has been working on with PR ( https://github.com/numpy/numpy/pull/16935 ). I think we will still need some updates to Dask. Though there have been a few people poking at that problem too.

There may be other ways to work around this shorter term. Happy to think about that as well.

Given what NumPy's ndindex is doing, I don't think we need a CuPy implementation per se. Though feel free to point out the issues you are seeing and would be happy to take a look 🙂

GenevieveBuckley commented 4 years ago

Given what NumPy's ndindex is doing, I don't think we need a CuPy implementation per se. Though feel free to point out the issues you are seeing and would be happy to take a look 🙂

I agree, all we need is a way to iterate over the thing, so that should be doable.

There may be other ways to work around this shorter term. Happy to think about that as well.

Maybe we should plan to catch up in two weeks or so after your break? Synchronous or asynchronous. At the moment I don't have a clear plan for GPU ndfourier and ndmeasure.

jakirkham commented 4 years ago

That sounds like a good plan.

One trick that has been handy for these has been using da.ones_like where appropriate and building off of that. Here's one for arange. Though this could then be extended to handle other details like offsets, scalings, or other operations that can be constructed from a linear scaling.

import cupy as cp
import numpy as np
import dask.array as da

a = da.from_array(cp.random.random((10,)), chunks=2)
r = da.cumsum(da.ones_like(a)) - 1  # `da.arange` now with the type of `a`

Note: If we need other flexibility like overriding the shape in da.ones_like, that is possible as long as we have NumPy 1.17.0+ ( https://github.com/numpy/numpy/pull/13046 ) (probably a pretty reasonable requirement at this point) and Dask 2.19.0+ ( https://github.com/dask/dask/pull/6064 ).

GenevieveBuckley commented 4 years ago

It turns out that binary_dilation and binary_erosion cupy functions do not exist right now. We'll need that for GPU support of ndmorph (the actual implementation will be pretty straightforward once those functions exist).

There are cupy functions for grey_dilation and grey_erosion that do exist, which have slightly different function signatures.

grlee77 commented 4 years ago

I went ahead and ported the binary morphology operations from cupyimg over to CuPy in cupy/cupy#3907. I don't think these will appear in CuPy 8.0, though, as 8.0.0rc1 was already released.

GenevieveBuckley commented 4 years ago

That's super helpful, thanks @grlee77!

GenevieveBuckley commented 4 years ago

One trick that has been handy for these has been using da.ones_like where appropriate and building off of that. Here's one for arange. Though this could then be extended to handle other details like offsets, scalings, or other operations that can be constructed from a linear scaling.
import cupy as cp
import numpy as np
import dask.array as da

a = da.from_array(cp.random.random((10,)), chunks=2)
r = da.cumsum(da.ones_like(a)) - 1  # `da.arange` now with the type of `a` 
Note: If we need other flexibility like overriding the shape in da.ones_like, that is possible as long as we have NumPy 1.17.0+ ( numpy/numpy#13046 ) (probably a pretty reasonable requirement at this point) and Dask 2.19.0+ ( dask/dask#6064 ).

I think ones_like is only good for specifying the datatype (int or float) but not good at specifying the arraytype (numpy or cupy). It's possible I misunderstand something here, but I don't get a cupy dask array result when I try that example:

In [1]: import cupy as cp

In [2]: cp.__version__
Out[2]: '8.0.0rc1'

In [3]: import numpy as np

In [4]: np.__version__
Out[4]: '1.19.1'

In [5]: import dask

In [6]: dask.__version__
Out[6]: '2.25.0+6.g0ca1607a'

In [7]: import dask.array as da

In [8]: a = da.from_array(cp.random.random((10,)), chunks=2)

In [9]: a
Out[9]: dask.array<array, shape=(10,), dtype=float64, chunksize=(2,), chunktype=cupy.ndarray>

In [10]: type(a.compute())
Out[10]: cupy.core.core.ndarray

In [11]: ones = da.ones_like(a)

In [12]: ones
Out[12]: dask.array<ones, shape=(10,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>

In [13]: type(ones.compute())
Out[13]: numpy.ndarray

jakirkham commented 4 years ago

cc @pentschev (in case you have thoughts on the issue above)

pentschev commented 4 years ago

I know that some support for da.ones_like was added in https://github.com/dask/dask/pull/6064, but I didn't have a chance to review or test that so I can't really comment if it supports what's needed for here. Maybe we will need to review that and fix it if necessary, I won't have the time for that in the next 2-3 weeks at least though. If someone wants to pick that up I can certainly try to help reviewing or guiding in some details if necessary.

GenevieveBuckley commented 4 years ago

No, that support doesn't extend quite as far as we need here (the example in my comment above uses a version of dask after https://github.com/dask/dask/pull/6064 was merged).

jakirkham commented 3 years ago

Peter has been doing some work recently implementing NEP 35 in PR ( https://github.com/dask/dask/pull/6738 ), which handles some of these tricky array creation cases. It requires NumPy 1.20 to work. Do you think this is ready for people to try Peter? Would it help with the case Genevieve highlighted above? 🙂

pentschev commented 3 years ago

Absolutely, it would be great to get people testing that. I checked the case mentioned in https://github.com/dask/dask-image/issues/133#issuecomment-685404678 and it works even without https://github.com/dask/dask/pull/6738, which I guess has been solved by https://github.com/dask/dask/pull/6064, but hopefully the NEP-35 support PR will solve other issues, but just to be clear, it's still not a complete effort, there's plenty of code left to be supported which I hope to work on in the next few months. So if you hit any other unhandled cases, please let me know. 😄

GenevieveBuckley commented 3 years ago

Absolutely, it would be great to get people testing that. I checked the case mentioned in #133 (comment) and it works

Was there anything in particular you wanted other people to test for you, @pentschev ?

pentschev commented 3 years ago

Was there anything in particular you wanted other people to test for you, @pentschev ?

Nothing really specific, I would just generally like to get feedback from whether NEP-35 unblocked some people's use cases. The more common cases that should now work are in https://github.com/dask/dask/pull/6738/files#diff-73bb3e9861272d85e4078b47f67a474426627a0e01fb9734c9b6d194c62bfd37 , so would be great to hear from people who actually needed any of those and now (hopefully) can move further along, but no worries if none of them are necessary in ongoing work. :)

jakirkham commented 3 years ago

Just to double check here, which functions are not covered currently?

pentschev commented 3 years ago

Pretty much any function that uses np.*array or da.*array, there's a bunch of them in https://github.com/dask/dask/blob/master/dask/array/routines.py, for example.

jakirkham commented 3 years ago

Ah sorry I should have been clearer. That question was directed to Genevieve

Edit: Trying to figure out what dask-image functions are not supported on GPUs currently

GenevieveBuckley commented 3 years ago

GPU support with cupy:

imread
ndfilters
ndmorph
ndinterp (nice work by Marvin)

No GPU support:

ndfourier
ndmeasure

GenevieveBuckley commented 3 years ago

To be clear, ndfourier and ndmeasure are left still to do because they are a bit harder. They don't have that nice easy pattern with map_overlap like all the ndfilters functions. It's probably worth sitting down and working out what's best to tackle next.

jakirkham commented 3 years ago

Thanks for the update Genevieve! 😄

Yeah I'd need to look at the ndfourier again. It may be that Peter's recent work unblocks these (namely we have better tools for array creation).

With ndmeasure, having label in particular would be really useful. There are other things in there that would be useful, but may be trickier to port.

I think we've talked about the value of regionprops. Though that's kind of its own issue outside of this one.

GenevieveBuckley commented 3 years ago

Yeah, if we can get the fourier utils working with cupy, I think that's the bulk of the work there done. There are a few numpy array creation calls, so this is a good chance to try out some of Peter's new work. Labels also has bunch of similar stuff going on.

jakirkham commented 3 years ago

Yeah I think there we need to get da.arange to use the like= parameter. Don't think this is covered yet. This would allow us to use like= in fftfreq. Not too bad, but some work. Though this is just in Dask.

Yeah label is a bit more complicated as it also uses sparse matrices. There is some support for this in CuPy and Dask. I don't think we will get __array_function__ into SciPy though ( https://github.com/scipy/scipy/issues/10362 ). So we might need to do some dispatching here ourselves. It's possible more recent iterations of array dispatching would be able to handle this somehow. We also need some things like scipy.sparse.csgraph.connected_components, which don't exist in CuPy yet, but may in the future ( https://github.com/cupy/cupy/pull/4054 ).

GenevieveBuckley commented 3 years ago

Yeah I think there we need to get da.arange to use the like= parameter. Don't think this is covered yet. This would allow us to use like= in fftfreq. Not too bad, but some work. Though this is just in Dask.

Is that something that @pentschev is working on?

pentschev commented 3 years ago

Is that something that @pentschev is working on?

I'm not right now, but I hope I can do that later in March. If you have a list of functions that would immediately benefit from it, I can certainly prioritize that. Do you see anything besides da.arange at the moment?

GenevieveBuckley commented 3 years ago

I'm not sure how compatible or not things already are, but to give you a sense of what our fourier module is using, we have:

dask.array.meshgrid and dask.array.fft.fftfreq in the _utils

https://github.com/dask/dask-image/blob/91c4955a88055e98d1f97b01b7841dcc2c90b73e/dask_image/ndfourier/_utils.py#L20-L25

... and also get used here are:

dask.array.tensordot
dask.array.exp
dask.array.sinc
dask.array.prod

Hopefully some of these will be fine without any changes (fingers crossed)

jakirkham commented 3 years ago

Yeah the rest of those functions should already be fine as they are taking arrays as input as opposed to creating new ones. So it sounds like just arange. This comes up in other places as well Peter.

Idk to what extent people use other things (like linspace) with Dask, but we might want to look at those as well. Though these can usually be generated from arange.

GenevieveBuckley commented 3 years ago

Labels rely a lot on sparse matrices.

I found this table explaining what sparse matrix support cupy has: https://docs.cupy.dev/en/stable/reference/sparse.html

I don't see scipy.sparse.csgraph.connected_components in that list https://github.com/dask/dask-image/blob/91c4955a88055e98d1f97b01b7841dcc2c90b73e/dask_image/ndmeasure/_utils/_label.py#L23

That's the only thing that stands out to me there.

jakirkham commented 3 years ago

Yep I think that was covered in the latter half of this comment ( https://github.com/dask/dask-image/issues/133#issuecomment-788584782 ). Happy to follow up on anything from that which may have been missed, needs more info/discussion, etc.

pentschev commented 3 years ago

... and also get used here are:

dask.array.tensordot

dask.array.exp

dask.array.sinc

dask.array.prod

It seems that all those functions, with the exception of exp are already tested in https://github.com/dask/dask/blob/master/dask/array/tests/test_cupy.py, so they are probably working fine. If you know you're using some functions that aren't covered there, I would appreciate adding tests to that file too, just copy the already existing Dask tests for NumPy-backed arrays in there with a few modifications to use CuPy and that's it, if they're not working yet today, marking them as xfail is also helpful so that we know they've been tested but aren't working today. 😄

I don't see scipy.sparse.csgraph.connected_components in that list

Besides what John already said, there's some exploratory work in plugging cuGraph into CuPy in https://github.com/cupy/cupy/issues/4219 .

jakirkham commented 3 years ago

... and also get used here are:

dask.array.tensordot

dask.array.exp

dask.array.sinc

dask.array.prod

It seems that all those functions, with the exception of exp are already tested in https://github.com/dask/dask/blob/master/dask/array/tests/test_cupy.py, so they are probably working fine.

Included exp in those tests with PR ( https://github.com/dask/dask/pull/7322 )

GenevieveBuckley commented 3 years ago

FYI: connected components has been merged into the cupy master branch (should be included in the cupy 10.0.0 release): https://github.com/cupy/cupy/pull/4940

This might be something we need to have GPU support for label

pentschev commented 3 years ago

Just FYI, it's also being backported to CuPy 9.0 (planned for release April 22nd) in https://github.com/cupy/cupy/pull/5113 . Assuming https://github.com/conda-forge/staged-recipes/pull/14546 gets merged until then, it may be usable just after CuPy 9 release by installing libcugraph from conda-forge.

jakirkham commented 3 years ago

Yeah, on the cuGraph point, this is supported in the CuPy 9 build in conda-forge

dask / dask-image

GPU Support #133