dask / dask

Parallel computing with task scheduling
https://dask.org
BSD 3-Clause "New" or "Revised" License
12.47k stars 1.7k forks source link

Can only specify boundary conditions per-axis with `ghost.ghost` #477

Closed gforsyth closed 9 years ago

gforsyth commented 9 years ago

As asked on SO

The current ghost.ghost function allows specifying a few different types of B.C. but they're always the same for both y-boundaries and both x-boundaries. This makes perfect sense for periodic boundaries, but less so for Dirichlet boundaries.

As @mrocklin commented, ghost.ghost_internal can accomplish most of this but it would be good if it were exposed in ghost.ghost, e.g.

g = da.ghost.ghost(x, depth={0: 1, 1: 1}, boundary={0: 100, 1: 100, 2: 0, 3: 0})

mrocklin commented 9 years ago

Pinging @stefanv

stefanv commented 9 years ago

In the scikit-image use-case, a feature that would be useful is to only ghost "to the inside" of the array, i.e. to ghost, but to crop the ghosted array to the original image extents.

The reason for this is that scikit-image has specialized handling of boundaries, which sometimes causes artefacts. If ghosting pads out those edges, these (desired) artefacts will disappear.

cowlicks commented 9 years ago

@stefanv that is currently possible with ghost_internal.

Perhaps this would be best exposed by adding a internal_depth keyword argument to da.ghost.ghost. Which takes dict keyed with axes and valued with depths. If an axis in this dictionary is not specified then it will be taken from the depth kwarg. And if just the internal_depth is specified for an axis, the external boundary will be zero.

cowlicks commented 9 years ago

Note I mixed up boundary and depth above, but have since edited the comment to fix this.^

stefanv commented 9 years ago

There should be an easy way to say: "do what you normally do, just don't go outside the array boundaries"

mrocklin commented 9 years ago

Ideally this option ends up within the da.Array.map_overlap function which, I think, has the most user friendly API so far. I think that this function came out of @cowlicks early work with skimage.

At the moment the boundary=None option defaults to reflect. Perhaps None should actually mean "no boundary."

cowlicks commented 9 years ago

@mrocklin there are currently two arguments for specifying the ghosting. depth and boundary. Which, more verbosely, mean "depth of ghosting" and "external ghosting type". If someone specifies a ghosting depth, but no boundary type, I think they would still expect some ghosting to happen.

One reason reflect might be the default is because this is the behavior for some skimage filters.

cowlicks commented 9 years ago

@stefanv we could have an internal_ghosting_only=True/False kwarg in ghost. However I think an internal_depth kwarg would be valuable since it would allow the user to have internal ghosting which differs from external ghosting.

mrocklin commented 9 years ago

If someone specifies a ghosting depth, but no boundary type, I think they would still expect some ghosting to happen

I agree. But would the expect ghosting on the boundary?

One reason reflect might be the default is because this is the behavior for some skimage filters.

This is a good reason to keep the default as reflect. I wonder if we should set the default to boundary='reflect' in the function signature and then use boundary=None to mean internal ghosting only.

cowlicks commented 9 years ago

@gforsyth you can specify the external boundary conditions per axis.

In [1]: import numpy as np; import dask.array as da

In [2]: a = da.from_array(np.arange(16).reshape(4, 4), chunks=(2,2))

In [3]: g = da.ghost.ghost(a, depth={0: 1, 1:2}, boundary={0:42, 1:'reflect'})

In [4]: g.compute()
Out[4]: 
array([[42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
       [ 1,  0,  0,  1,  2,  3,  0,  1,  2,  3,  3,  2],
       [ 5,  4,  4,  5,  6,  7,  4,  5,  6,  7,  7,  6],
       [ 9,  8,  8,  9, 10, 11,  8,  9, 10, 11, 11, 10],
       [ 5,  4,  4,  5,  6,  7,  4,  5,  6,  7,  7,  6],
       [ 9,  8,  8,  9, 10, 11,  8,  9, 10, 11, 11, 10],
       [13, 12, 12, 13, 14, 15, 12, 13, 14, 15, 15, 14],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42]])

But you cannot specify the internal boundary conditions, the internal boundary conditions just copy the elements of the chunks next to them. We could possibly change this, but I'm not sure this is what you want.

And currently the depth argument is the depth of the boundary for both internal and external boundaries. We could easily change this so this by adding a internal_depth argument which would default to be the same as depth if it is not provided.

mrocklin commented 9 years ago

I suspect that @gforsyth wanted to have different boundary conditions on the left vs right or top vs bottom, which is not currently supported.

gforsyth commented 9 years ago

That's what I was thinking of. I was just playing around with solving PDEs in Dask and if you're doing something that's elliptic and you want to set one side of your domain to 0 and the other 3 to 100 that doesn't work the way ghost is currently set up.

cowlicks commented 9 years ago

@gforsyth Perhaps a good api here would be to allow tuples as values in the boundary argument like:

In [1]: import numpy as np; import dask.array as da

In [2]: a = da.from_array(np.arange(16).reshape(4, 4), chunks=(2,2))

In [3]: g = da.ghost.ghost(a, depth={0: 1, 1:2}, boundary={0: (33, 42), 1:('reflect', 66)}

In [4]: g.compute()
Out[4]: 
array([[33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 66],
       [ 1,  0,  0,  1,  2,  3,  0,  1,  2,  3,  3, 66],
       [ 5,  4,  4,  5,  6,  7,  4,  5,  6,  7,  7, 66],
       [ 9,  8,  8,  9, 10, 11,  8,  9, 10, 11, 11, 66],
       [ 5,  4,  4,  5,  6,  7,  4,  5,  6,  7,  7, 66],
       [ 9,  8,  8,  9, 10, 11,  8,  9, 10, 11, 11, 66],
       [13, 12, 12, 13, 14, 15, 12, 13, 14, 15, 15, 66],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 66]])

Note that doing something like this with the depth argument would make internal depth ambiguous.

If I understand correctly this is distinct from what @stefanv wants.

gforsyth commented 9 years ago

I think that would work well for specifying the external boundaries. Does it make any sense to also have a depth tuple possible? You might need to have a reflection that's 2-deep for a Neumann boundary but only need a 1-deep for Dirichlet on the opposite boundary.

gforsyth commented 9 years ago

Hrrr. Just realized that that would necessitate a separate depth kwarg for internal vs. external boundaries otherwise it's going to be very ambiguous.

gforsyth commented 9 years ago

Noting your example array

In [4]: g.compute()
Out[4]: 
array([[33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 66],
       [ 1,  0,  0,  1,  2,  3,  0,  1,  2,  3,  3, 66],
       [ 5,  4,  4,  5,  6,  7,  4,  5,  6,  7,  7, 66],
       [ 9,  8,  8,  9, 10, 11,  8,  9, 10, 11, 11, 66],
       [ 5,  4,  4,  5,  6,  7,  4,  5,  6,  7,  7, 66],
       [ 9,  8,  8,  9, 10, 11,  8,  9, 10, 11, 11, 66],
       [13, 12, 12, 13, 14, 15, 12, 13, 14, 15, 15, 66],
       [42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 66]])

the external boundary depth doesn't particularly matter -- if it were the difference between a Neumann and Dirichlet boundary the extra bit of memory required to pad out a Dirichlet boundary one column further than necessary is negligible.

That said, g[:-2] seems to be a reflect while g[:-1] is Dirichlet. I imagine it's just an editing error from mocking up the boundary tuple example but so long as both g[:-2] and g[:-1] were both 66 in the above example that would cover the use cases I have.

stefanv commented 9 years ago

One thing to keep in mind when evaluating all these new options, is that preferably all keywords should be usable in combination with each other. The API becomes terribly convoluted when you have "use this keyword unless than one is specified, in which case we change the meaning of the first or ignore it".

gforsyth commented 9 years ago

Agreed. My suggestion for a depth tuple is unnecessary (and also ignored @cowlicks pointing out the inherent ambiguity in such a setup). I think allowing tuples in the boundary argument is sufficient.

cowlicks commented 9 years ago

@gforsyth I'm giving a talk this weekend (about dask), so I'll probably get to this Monday. Unless someone beats me to it.

@stefanv how is this API for your usecase: boundary=None will mean no external boundaries, boundary={0:(1, None)} would pad the left boundary with ones, and not add a boundary on the right, and it will default to `boundary='reflect'.

stefanv commented 9 years ago

Perfect.

cowlicks commented 9 years ago

@stefanv I think #681 partially serves your needs.