Closed mlange05 closed 6 years ago
OK, let me comment each of these three points.
1a)
Considering that the most common way to define the size of the padding regions [...] have different padding sizes.
Yes.
1b)
The user data in f.data should most likely be trimmed to only present the "domain part" [...] as always debatable).
Yes. Please, take a look at how I did that in YASK:
https://github.com/opesci/devito/blob/master/devito/yask/wrappers.py#L173
We could use the new type Data
that I'm introducing as part of #448 . This is a subclass of a numpy array. When the user calls data.with_halo
, I think what we would have to do is to return a view of the same object with "enlarged boundaries". We probably have to generalise the logic indexing that we're gonna use for buffered dimensions. That is:
u.data[0]
, I'm getting u.data[0 + left_padding]
u.data.with_halo[0]
, I'm getting u.data[0]
.In fact, let me rephrase what I wrote above. When the user writes u.data.with_halo
, he gets the "real" (full) array; when the user writes u.data
, he gets a "shrunk" view. I really think what I did for YASK is analogous.
2)
Since the domain-based approach indexes relative to the "write" part of the data [...] conventions should be.
This should not be too difficult too, but it's very easy to get it buggy
3)
This is the trickiest part, as we will need to be able to handle data with different padding/boundary sizes in a single loop [...]
I think the loop bounds won't change, they will still be dictated by the "largest offsets". What have to change, probably, are the array indices.
We know the padding size of each Function
, right? Then, we might need a further Operator
pass that basically applies an xreplace
to the various Indexed
to shift down/up the various indices. This pass should occur prior to loop construction; in fact, this should be executed somewhere at the very beginning, right after the stencils calculation. We can exploit this utility function to shift indices:
https://github.com/opesci/devito/blob/master/devito/symbolics/manipulation.py#L134
How do we do this? I think it's a matter of when, rather than how; the how should probably be a one day sprint..
@mlange05 I have a question about point 1. (How) Is this gonna impact the current examples, or even all Devito codes?
Also: do you think we should do anything special for SparseFunction
?
Yes, it's quite likely that a lot of things will break once we switch. The seismic stuff all hides behind the Model
, so there is a single choke-point to fix, but in general this change is quite intrusive.
SparseFunction
needs a general re-think, although in this context it should not require any deep changes, as the indirection entirely depends on the the coordinates of the points. As long as the coordinates are treated right, the sparse interpolation should just fall out.
A possible naming scheme:
data
aliases to data_domain
, and it’s a view of the actual data (restrained to the domain, obviously)data.with_halo
is the true local datadata.allocated
OR data.with_padding
might be domain+halo+paddingdata.global
OR data.allocated
might be the whole dataset across multiple ranks ( if we need it..)we might also need:
data.global.with_halo
for the whole dataset across multiple ranks including the top-{left,right,...} halo region@navjotk A quick question, just to be sure: will you still need the operator offsets (that #453 is now moving to the arguments engine) after the domain-addressing switch? I guess the answer is No.
So, we should also discuss this:
However, the associated x_size needs to be treated carefully, since it is used for the data cassts in the C kernels. It thus needs to be in terms of "allocation", rather than "domain" and we should discuss what the conventions should be.
@mlange05 this is because x_size
will refer to the domain size, rather than the allocation size, correct? In theory, from now on, each function will need its own allocation size, such as m_x_size, m_y_size, ... u_x_size, u_y_size, ...
and so on. Unless I'm missing something obvious, which could be the case, this leads to an explosion in the number of kernel parameters. We might want to start dropping in structs, rather than ints. This issue had already been noted here #387 . I'm worried I'm actually talking BS. What am I missing?
@mlange05 you said you might have had time to help on this stuff -- perhaps this thing above is the self-contained task you needed? just a random guess -- and forgive him if something higher priority as popped up meanwhile ! (@ggorman feel free to share your thoughts here)
Well, since we need to add in the offsets for each individual function symbol inside the loop anyway to account for the different order/padding offsets in each field (the loop will be over domain), we might as well hard-code the offsets into the data casts. This would prevent the pre-symbol-per-dim state-space explosion, I think (I might be missing something).
To illustrate, consider two Function
objects f=Function(name='f', grid=grid, space_order=1)
and g=Function(name='g', grid=grid, space_order=3)
and dimensions (x, y)
. Then Operator(Eq(f, g))
would give something like (untested and only conceptually):
f[2*1+x_size, 2*1+y_size] = f_vec (...<rest of cast>) # Allocation-based casts
g[2*3+x_size, 2*3+y_size] = g_vec (...<rest of cast>)
for(x=x_s; x<x_e; x++) # Domain-based iterator
for (y=y_s; y<y_e; y++) # Domain-based iterator
f[1+x, 1+y] = g[3+x, 3+y] # Allocation-aligned data access
Does that make sense?
Yes I believe we would still need the offsets that I've put in place.
Does that make sense?
I think it does
Yes I believe we would still need the offsets that I've put in place.
why/when?
why/when?
@navjotk reminder
why/when?
@navjotk reminder
@mlange05 you originally wrote:
However, the associated x_size needs to be treated carefully, since it is used for the data cassts in the C kernels. It thus needs to be in terms of "allocation", rather than "domain" and we should discuss what the conventions should be.
but then, in the code snippet above, you did something different. x_size
looks like the domain size. This latter approach seems easier to me: there would be only one x_size
.
Yes, I hadn't thought this through properly. Considering we might have functions with different allocation sizes in shared dimensions, I think using x_size
as domain and hard-coding the individual difference (offset
) in each cast would be the most sensible solution if we want to avoid a variable-name explosion.
@mlange05 how about non-affine index accesses, such as via sources and receivers positions? should they be added the halo+padding extent too? I guess so...?
As discussed in various other contexts before, Devito currently enforces an index addressing scheme based on absolute value with respect to the allocated data of a
Function
. Such a (for lack of a better name) allocation-based addressing scheme entails that the user needs to know exactly what the radius of the stencil is that he is going to use and allocate enough data for anOperator
to work on, as well as needing to manually calculate offsets when supplying explicit dimension indices onOperator
invocation.To improve on this we should switch to a domain-based scheme, where instead grid indices are addressed relative to the "write-to" part of the data, called the "domain". To illustrate, consider a single dimension with a "domain size" of 100 elements/data points, and a 1-point boundary or stencil radius:
Now, I believe there are three sections of the code that need changing to implement this, as well as several API changes required to make this work neatly:
1. Data allocation and Python/numpy access Considering that the most common way to define the size of the padding regions (the read-only part of the data at the boundary) is through the
Function
-specificspace/time_order
parameter, I would argue that theshape
taken by the grid should be interpreted as the "domain shape". That is to say that all functions sharing aGrid
object agree on the same domain size, but may have different padding sizes.The user data in
f.data
should most likely be trimmed to only present the "domain part" of the data, although the halo/padding/boundary part of the data also needs to be accessible to users. For this reason, I'm proposing to introducef.data_domain
andf.data_allocated
, wheref.data
defaults tof.data_domain
(naming, as always debatable).2. Runtime argument derivation
Since the domain-based approach indexes relative to the "write" part of the data and loop,
x_start
/x_end
variables can be expressed/expected with respect to the domain and should thus not include any offsets. However, the associatedx_size
needs to be treated carefully, since it is used for the data cassts in the C kernels. It thus needs to be in terms of "allocation", rather than "domain" and we should discuss what the conventions should be.3. Loop bounds in the C kernel
This is the trickiest part, as we will need to be able to handle data with different padding/boundary sizes in a single loop. As an example, consider the following boundary sizes:
a: 0, b: 1, c: 4
and a stencila[x] = a[x] + (b[x-1]+b[x+1])/2 + (c[x-4]+c[+4])/2
This looks like this in memory:
My intuition on this is that the main loop counter for
x
in this case should be in terms of the domain ("write region"), so that arguments and loop boundaries all keep their natural meaning. How we "translate" or "normalize" the respective boundaries in the generated code is to be discussed.