Closed gforsyth closed 9 years ago
Pinging @stefanv
In the scikit-image use-case, a feature that would be useful is to only ghost "to the inside" of the array, i.e. to ghost, but to crop the ghosted array to the original image extents.
The reason for this is that scikit-image has specialized handling of boundaries, which sometimes causes artefacts. If ghosting pads out those edges, these (desired) artefacts will disappear.
@stefanv that is currently possible with ghost_internal
.
Perhaps this would be best exposed by adding a internal_depth
keyword argument to da.ghost.ghost
. Which takes dict keyed with axes and valued with depths. If an axis in this dictionary is not specified then it will be taken from the depth
kwarg. And if just the internal_depth
is specified for an axis, the external boundary will be zero.
Note I mixed up boundary
and depth
above, but have since edited the comment to fix this.^
There should be an easy way to say: "do what you normally do, just don't go outside the array boundaries"
Ideally this option ends up within the da.Array.map_overlap
function which, I think, has the most user friendly API so far. I think that this function came out of @cowlicks early work with skimage.
At the moment the boundary=None
option defaults to reflect
. Perhaps None
should actually mean "no boundary."
@mrocklin there are currently two arguments for specifying the ghosting. depth
and boundary
. Which, more verbosely, mean "depth of ghosting" and "external ghosting type". If someone specifies a ghosting depth, but no boundary type, I think they would still expect some ghosting to happen.
One reason reflect
might be the default is because this is the behavior for some skimage filters.
@stefanv we could have an internal_ghosting_only=True/False
kwarg in ghost. However I think an internal_depth
kwarg would be valuable since it would allow the user to have internal ghosting which differs from external ghosting.
If someone specifies a ghosting depth, but no boundary type, I think they would still expect some ghosting to happen
I agree. But would the expect ghosting on the boundary?
One reason reflect might be the default is because this is the behavior for some skimage filters.
This is a good reason to keep the default as reflect
. I wonder if we should set the default to boundary='reflect'
in the function signature and then use boundary=None
to mean internal ghosting only.
@gforsyth you can specify the external boundary conditions per axis.
In [1]: import numpy as np; import dask.array as da
In [2]: a = da.from_array(np.arange(16).reshape(4, 4), chunks=(2,2))
In [3]: g = da.ghost.ghost(a, depth={0: 1, 1:2}, boundary={0:42, 1:'reflect'})
In [4]: g.compute()
Out[4]:
array([[42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42],
[ 1, 0, 0, 1, 2, 3, 0, 1, 2, 3, 3, 2],
[ 5, 4, 4, 5, 6, 7, 4, 5, 6, 7, 7, 6],
[ 9, 8, 8, 9, 10, 11, 8, 9, 10, 11, 11, 10],
[ 5, 4, 4, 5, 6, 7, 4, 5, 6, 7, 7, 6],
[ 9, 8, 8, 9, 10, 11, 8, 9, 10, 11, 11, 10],
[13, 12, 12, 13, 14, 15, 12, 13, 14, 15, 15, 14],
[42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42]])
But you cannot specify the internal boundary conditions, the internal boundary conditions just copy the elements of the chunks next to them. We could possibly change this, but I'm not sure this is what you want.
And currently the depth
argument is the depth of the boundary for both internal and external boundaries. We could easily change this so this by adding a internal_depth
argument which would default to be the same as depth
if it is not provided.
I suspect that @gforsyth wanted to have different boundary conditions on the left vs right or top vs bottom, which is not currently supported.
That's what I was thinking of. I was just playing around with solving PDEs in Dask and if you're doing something that's elliptic and you want to set one side of your domain to 0 and the other 3 to 100 that doesn't work the way ghost is currently set up.
@gforsyth Perhaps a good api here would be to allow tuples as values in the boundary
argument like:
In [1]: import numpy as np; import dask.array as da
In [2]: a = da.from_array(np.arange(16).reshape(4, 4), chunks=(2,2))
In [3]: g = da.ghost.ghost(a, depth={0: 1, 1:2}, boundary={0: (33, 42), 1:('reflect', 66)}
In [4]: g.compute()
Out[4]:
array([[33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 66],
[ 1, 0, 0, 1, 2, 3, 0, 1, 2, 3, 3, 66],
[ 5, 4, 4, 5, 6, 7, 4, 5, 6, 7, 7, 66],
[ 9, 8, 8, 9, 10, 11, 8, 9, 10, 11, 11, 66],
[ 5, 4, 4, 5, 6, 7, 4, 5, 6, 7, 7, 66],
[ 9, 8, 8, 9, 10, 11, 8, 9, 10, 11, 11, 66],
[13, 12, 12, 13, 14, 15, 12, 13, 14, 15, 15, 66],
[42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 66]])
Note that doing something like this with the depth
argument would make internal depth ambiguous.
If I understand correctly this is distinct from what @stefanv wants.
I think that would work well for specifying the external boundaries. Does it make any sense to also have a depth
tuple possible? You might need to have a reflection that's 2-deep for a Neumann boundary but only need a 1-deep for Dirichlet on the opposite boundary.
Hrrr. Just realized that that would necessitate a separate depth kwarg for internal vs. external boundaries otherwise it's going to be very ambiguous.
Noting your example array
In [4]: g.compute()
Out[4]:
array([[33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 66],
[ 1, 0, 0, 1, 2, 3, 0, 1, 2, 3, 3, 66],
[ 5, 4, 4, 5, 6, 7, 4, 5, 6, 7, 7, 66],
[ 9, 8, 8, 9, 10, 11, 8, 9, 10, 11, 11, 66],
[ 5, 4, 4, 5, 6, 7, 4, 5, 6, 7, 7, 66],
[ 9, 8, 8, 9, 10, 11, 8, 9, 10, 11, 11, 66],
[13, 12, 12, 13, 14, 15, 12, 13, 14, 15, 15, 66],
[42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 66]])
the external boundary depth doesn't particularly matter -- if it were the difference between a Neumann and Dirichlet boundary the extra bit of memory required to pad out a Dirichlet boundary one column further than necessary is negligible.
That said, g[:-2]
seems to be a reflect while g[:-1]
is Dirichlet. I imagine it's just an editing error from mocking up the boundary tuple example but so long as both g[:-2]
and g[:-1]
were both 66 in the above example that would cover the use cases I have.
One thing to keep in mind when evaluating all these new options, is that preferably all keywords should be usable in combination with each other. The API becomes terribly convoluted when you have "use this keyword unless than one is specified, in which case we change the meaning of the first or ignore it".
Agreed. My suggestion for a depth tuple is unnecessary (and also ignored @cowlicks pointing out the inherent ambiguity in such a setup). I think allowing tuples in the boundary
argument is sufficient.
@gforsyth I'm giving a talk this weekend (about dask), so I'll probably get to this Monday. Unless someone beats me to it.
@stefanv how is this API for your usecase: boundary=None
will mean no external boundaries, boundary={0:(1, None)}
would pad the left boundary with ones, and not add a boundary on the right, and it will default to `boundary='reflect'.
Perfect.
@stefanv I think #681 partially serves your needs.
As asked on SO
The current
ghost.ghost
function allows specifying a few different types of B.C. but they're always the same for both y-boundaries and both x-boundaries. This makes perfect sense for periodic boundaries, but less so for Dirichlet boundaries.As @mrocklin commented,
ghost.ghost_internal
can accomplish most of this but it would be good if it were exposed inghost.ghost
, e.g.g = da.ghost.ghost(x, depth={0: 1, 1: 1}, boundary={0: 100, 1: 100, 2: 0, 3: 0})