Open jakirkham opened 4 years ago
Here's what convolution would look like if someone rolled their own today using CuPy 7.0.0 and Dask 2.9.1. There are a few rough edges, but we could improve on this over time both in dask-image and upstream.
import numpy
import numpy as np
import cupy
import cupy as cp
import cupyx
import cupyx.scipy
import cupyx.scipy.ndimage
import cupyx.scipy.ndimage.filters
import dask
import dask.array
import dask.array as da
import dask_image
import dask_image.ndfilters
import dask_image.ndfilters._utils
s = (100, 110)
a = da.from_array(cp.arange(int(np.prod(s)), dtype=cp.float32).reshape(s), chunks=10)
w = cp.ones(a.ndim * (3,), dtype=cp.float32)
origin = dask_image.ndfilters._utils._get_origin(w.shape, 0)
depth = dask_image.ndfilters._utils._get_depth(w.shape, origin)
depth, boundary = dask_image.ndfilters._utils._get_depth_boundary(a.ndim, depth, "none")
r = a.map_overlap(cupyx.scipy.ndimage.filters.convolve,
depth=depth,
boundary=boundary,
dtype=a.dtype,
weights=w,
mode="reflect",
cval=0.0,
origin=origin)
r._meta = cp.empty(r.ndim * (0,), dtype=r.dtype)
r.compute()
Linking to this blog post from Matt about mixing dask and cupy arrays, since I found it helpful to read and it's relevant for anyone who comes across this issue: https://matthewrocklin.com/blog/work/2019/01/03/dask-array-gpus-first-steps
Also we may be able to use __array_function__
to dispatch over different operations (like convolve
here). Some discussion along these lines in issue ( https://github.com/scipy/scipy/issues/10204 ).
I haven't had a chance to look much into cupyimg (BSD-3 licensed), but that could also be useful to look at here: https://github.com/mritools/cupyimg
Yeah Iβve seen that. Thereβs also some work to add equivalent functions to CuPy.
The long term solution would be to use __array_function__
to dispatch. Maybe a short to medium term solution would be to provide our own dispatching mechanism.
I've been thinking about implementing some sort of dispatching solution. Maybe that gives us something useful in the near term to medium term. I'll try to put a POC together when I have a chance so we can take a look.
@jakirkham is it worth us having a quick discussion about the still to be decided dispatch mechanism? (Been trying and failing to get my thoughts coherent enough for text for a couple of months now)
I'm interested in putting in some of my own effort into helping things along. I've been doing a lot of reading around NEP 18, 37, etc. but I'm still pretty unclear about what things should look like for us here. Presumably you're a lot clearer on this in your own mind.
Yeah I think that is a great idea! Was thinking about suggesting that as well. Am on US Pacific timezone now so should be easier to come up with a time that works. Should we try to schedule something for next week?
Sounds good @jakirkham I've just sent you an email, so we don't clog up the thread here with conversation about scheduling.
So it seems GPU support for our ndfourier
and ndmeasure
modules will be significantly more complicated to implement. The extra difficulty here is because these modules don't have the same implementation pattern using map_overlap
like we have in ndfilters
etc.
One of the main blockers as I understand it is that there isn't currently any support for switching array backends for functions like dask.array.ones()
, dask.array.arange()
etc. These wrap numpy directly (see here for an example) and get used a lot - both in our code and in dask functions like dask.fft.fftfreq.
Worth noting: there is an open issue at https://github.com/dask/dask/issues/6118 suggesting some problems with dask fourier transforms of cupy arrays might be solved with cupy release 8.0.0 this September. However, this won't fix all our problems because dask-image uses dask.fft.fftfreq
which also relies on dask arange
, which suffers the problem described in the previous paragraph.
We also use some numpy features that don't have equivalents in cupy - in particular we use the numpy.ndindex
iterator for label comprehensions (which all of the ndmeasure module relies on), but there is no cupy.ndindex
that exists.
Thanks for looking into these π
Yeah the general array creation problem is a known issue, which is what Peter has been working on with PR ( https://github.com/numpy/numpy/pull/16935 ). I think we will still need some updates to Dask. Though there have been a few people poking at that problem too.
There may be other ways to work around this shorter term. Happy to think about that as well.
Given what NumPy's ndindex
is doing, I don't think we need a CuPy implementation per se. Though feel free to point out the issues you are seeing and would be happy to take a look π
Given what NumPy's
ndindex
is doing, I don't think we need a CuPy implementation per se. Though feel free to point out the issues you are seeing and would be happy to take a look π
I agree, all we need is a way to iterate over the thing, so that should be doable.
There may be other ways to work around this shorter term. Happy to think about that as well.
Maybe we should plan to catch up in two weeks or so after your break? Synchronous or asynchronous. At the moment I don't have a clear plan for GPU ndfourier
and ndmeasure
.
That sounds like a good plan.
One trick that has been handy for these has been using da.ones_like
where appropriate and building off of that. Here's one for arange
. Though this could then be extended to handle other details like offsets, scalings, or other operations that can be constructed from a linear scaling.
import cupy as cp
import numpy as np
import dask.array as da
a = da.from_array(cp.random.random((10,)), chunks=2)
r = da.cumsum(da.ones_like(a)) - 1 # `da.arange` now with the type of `a`
Note: If we need other flexibility like overriding the shape
in da.ones_like
, that is possible as long as we have NumPy 1.17.0+ ( https://github.com/numpy/numpy/pull/13046 ) (probably a pretty reasonable requirement at this point) and Dask 2.19.0+ ( https://github.com/dask/dask/pull/6064 ).
It turns out that binary_dilation
and binary_erosion
cupy functions do not exist right now. We'll need that for GPU support of ndmorph (the actual implementation will be pretty straightforward once those functions exist).
There are cupy functions for grey_dilation
and grey_erosion
that do exist, which have slightly different function signatures.
I went ahead and ported the binary morphology operations from cupyimg
over to CuPy in cupy/cupy#3907. I don't think these will appear in CuPy 8.0, though, as 8.0.0rc1
was already released.
That's super helpful, thanks @grlee77!
One trick that has been handy for these has been using
da.ones_like
where appropriate and building off of that. Here's one forarange
. Though this could then be extended to handle other details like offsets, scalings, or other operations that can be constructed from a linear scaling.import cupy as cp import numpy as np import dask.array as da a = da.from_array(cp.random.random((10,)), chunks=2) r = da.cumsum(da.ones_like(a)) - 1 # `da.arange` now with the type of `a`
Note: If we need other flexibility like overriding the
shape
inda.ones_like
, that is possible as long as we have NumPy 1.17.0+ ( numpy/numpy#13046 ) (probably a pretty reasonable requirement at this point) and Dask 2.19.0+ ( dask/dask#6064 ).
I think ones_like
is only good for specifying the datatype (int
or float
) but not good at specifying the arraytype (numpy
or cupy
). It's possible I misunderstand something here, but I don't get a cupy dask array result when I try that example:
In [1]: import cupy as cp
In [2]: cp.__version__
Out[2]: '8.0.0rc1'
In [3]: import numpy as np
In [4]: np.__version__
Out[4]: '1.19.1'
In [5]: import dask
In [6]: dask.__version__
Out[6]: '2.25.0+6.g0ca1607a'
In [7]: import dask.array as da
In [8]: a = da.from_array(cp.random.random((10,)), chunks=2)
In [9]: a
Out[9]: dask.array<array, shape=(10,), dtype=float64, chunksize=(2,), chunktype=cupy.ndarray>
In [10]: type(a.compute())
Out[10]: cupy.core.core.ndarray
In [11]: ones = da.ones_like(a)
In [12]: ones
Out[12]: dask.array<ones, shape=(10,), dtype=float64, chunksize=(2,), chunktype=numpy.ndarray>
In [13]: type(ones.compute())
Out[13]: numpy.ndarray
cc @pentschev (in case you have thoughts on the issue above)
I know that some support for da.ones_like
was added in https://github.com/dask/dask/pull/6064, but I didn't have a chance to review or test that so I can't really comment if it supports what's needed for here. Maybe we will need to review that and fix it if necessary, I won't have the time for that in the next 2-3 weeks at least though. If someone wants to pick that up I can certainly try to help reviewing or guiding in some details if necessary.
No, that support doesn't extend quite as far as we need here (the example in my comment above uses a version of dask after https://github.com/dask/dask/pull/6064 was merged).
Peter has been doing some work recently implementing NEP 35 in PR ( https://github.com/dask/dask/pull/6738 ), which handles some of these tricky array creation cases. It requires NumPy 1.20 to work. Do you think this is ready for people to try Peter? Would it help with the case Genevieve highlighted above? π
Absolutely, it would be great to get people testing that. I checked the case mentioned in https://github.com/dask/dask-image/issues/133#issuecomment-685404678 and it works even without https://github.com/dask/dask/pull/6738, which I guess has been solved by https://github.com/dask/dask/pull/6064, but hopefully the NEP-35 support PR will solve other issues, but just to be clear, it's still not a complete effort, there's plenty of code left to be supported which I hope to work on in the next few months. So if you hit any other unhandled cases, please let me know. π
Absolutely, it would be great to get people testing that. I checked the case mentioned in #133 (comment) and it works
Was there anything in particular you wanted other people to test for you, @pentschev ?
Was there anything in particular you wanted other people to test for you, @pentschev ?
Nothing really specific, I would just generally like to get feedback from whether NEP-35 unblocked some people's use cases. The more common cases that should now work are in https://github.com/dask/dask/pull/6738/files#diff-73bb3e9861272d85e4078b47f67a474426627a0e01fb9734c9b6d194c62bfd37 , so would be great to hear from people who actually needed any of those and now (hopefully) can move further along, but no worries if none of them are necessary in ongoing work. :)
Just to double check here, which functions are not covered currently?
Pretty much any function that uses np.*array
or da.*array
, there's a bunch of them in https://github.com/dask/dask/blob/master/dask/array/routines.py, for example.
Ah sorry I should have been clearer. That question was directed to Genevieve
Edit: Trying to figure out what dask-image functions are not supported on GPUs currently
GPU support with cupy:
No GPU support:
To be clear, ndfourier
and ndmeasure
are left still to do because they are a bit harder. They don't have that nice easy pattern with map_overlap
like all the ndfilters functions. It's probably worth sitting down and working out what's best to tackle next.
Thanks for the update Genevieve! π
Yeah I'd need to look at the ndfourier
again. It may be that Peter's recent work unblocks these (namely we have better tools for array creation).
With ndmeasure
, having label
in particular would be really useful. There are other things in there that would be useful, but may be trickier to port.
I think we've talked about the value of regionprops
. Though that's kind of its own issue outside of this one.
Yeah, if we can get the fourier utils working with cupy, I think that's the bulk of the work there done. There are a few numpy array creation calls, so this is a good chance to try out some of Peter's new work. Labels also has bunch of similar stuff going on.
Yeah I think there we need to get da.arange
to use the like=
parameter. Don't think this is covered yet. This would allow us to use like=
in fftfreq
. Not too bad, but some work. Though this is just in Dask.
Yeah label
is a bit more complicated as it also uses sparse matrices. There is some support for this in CuPy and Dask. I don't think we will get __array_function__
into SciPy though ( https://github.com/scipy/scipy/issues/10362 ). So we might need to do some dispatching here ourselves. It's possible more recent iterations of array dispatching would be able to handle this somehow. We also need some things like scipy.sparse.csgraph.connected_components
, which don't exist in CuPy yet, but may in the future ( https://github.com/cupy/cupy/pull/4054 ).
Yeah I think there we need to get
da.arange
to use thelike=
parameter. Don't think this is covered yet. This would allow us to uselike=
infftfreq
. Not too bad, but some work. Though this is just in Dask.
Is that something that @pentschev is working on?
Is that something that @pentschev is working on?
I'm not right now, but I hope I can do that later in March. If you have a list of functions that would immediately benefit from it, I can certainly prioritize that. Do you see anything besides da.arange
at the moment?
I'm not sure how compatible or not things already are, but to give you a sense of what our fourier module is using, we have:
dask.array.meshgrid
and dask.array.fft.fftfreq
in the _utils
... and also get used here are:
dask.array.tensordot
dask.array.exp
dask.array.sinc
dask.array.prod
Hopefully some of these will be fine without any changes (fingers crossed)
Yeah the rest of those functions should already be fine as they are taking arrays as input as opposed to creating new ones. So it sounds like just arange
. This comes up in other places as well Peter.
Idk to what extent people use other things (like linspace
) with Dask, but we might want to look at those as well. Though these can usually be generated from arange
.
Labels rely a lot on sparse matrices.
I found this table explaining what sparse matrix support cupy has: https://docs.cupy.dev/en/stable/reference/sparse.html
I don't see scipy.sparse.csgraph.connected_components
in that list
https://github.com/dask/dask-image/blob/91c4955a88055e98d1f97b01b7841dcc2c90b73e/dask_image/ndmeasure/_utils/_label.py#L23
That's the only thing that stands out to me there.
Yep I think that was covered in the latter half of this comment ( https://github.com/dask/dask-image/issues/133#issuecomment-788584782 ). Happy to follow up on anything from that which may have been missed, needs more info/discussion, etc.
... and also get used here are:
dask.array.tensordot
dask.array.exp
dask.array.sinc
dask.array.prod
It seems that all those functions, with the exception of exp
are already tested in https://github.com/dask/dask/blob/master/dask/array/tests/test_cupy.py, so they are probably working fine. If you know you're using some functions that aren't covered there, I would appreciate adding tests to that file too, just copy the already existing Dask tests for NumPy-backed arrays in there with a few modifications to use CuPy and that's it, if they're not working yet today, marking them as xfail is also helpful so that we know they've been tested but aren't working today. π
I don't see
scipy.sparse.csgraph.connected_components
in that list
Besides what John already said, there's some exploratory work in plugging cuGraph into CuPy in https://github.com/cupy/cupy/issues/4219 .
... and also get used here are:
dask.array.tensordot
dask.array.exp
dask.array.sinc
dask.array.prod
It seems that all those functions, with the exception of
exp
are already tested in https://github.com/dask/dask/blob/master/dask/array/tests/test_cupy.py, so they are probably working fine.
Included exp
in those tests with PR ( https://github.com/dask/dask/pull/7322 )
FYI: connected components has been merged into the cupy master branch (should be included in the cupy 10.0.0 release): https://github.com/cupy/cupy/pull/4940
This might be something we need to have GPU support for label
Just FYI, it's also being backported to CuPy 9.0 (planned for release April 22nd) in https://github.com/cupy/cupy/pull/5113 . Assuming https://github.com/conda-forge/staged-recipes/pull/14546 gets merged until then, it may be usable just after CuPy 9 release by installing libcugraph
from conda-forge.
Yeah, on the cuGraph point, this is supported in the CuPy 9 build in conda-forge
Would be useful to support these operations on GPUs as well. One way to do this might be to dispatch using CuPy.