devitocodes / devito

DSL and compiler framework for automated finite-differences and stencil computation
http://www.devitoproject.org
MIT License
557 stars 226 forks source link

Domain-based index addressing #449

Closed mlange05 closed 6 years ago

mlange05 commented 6 years ago

As discussed in various other contexts before, Devito currently enforces an index addressing scheme based on absolute value with respect to the allocated data of a Function. Such a (for lack of a better name) allocation-based addressing scheme entails that the user needs to know exactly what the radius of the stencil is that he is going to use and allocate enough data for an Operator to work on, as well as needing to manually calculate offsets when supplying explicit dimension indices on Operator invocation.

To improve on this we should switch to a domain-based scheme, where instead grid indices are addressed relative to the "write-to" part of the data, called the "domain". To illustrate, consider a single dimension with a "domain size" of 100 elements/data points, and a 1-point boundary or stencil radius:

 boundary             domain                     boundary
  | -- | ---------------------------------------- | -- |
   read                write                       read

Data indices:
  0    1               51     71                 101  102
User indices
       0               50     70                 100

Now, I believe there are three sections of the code that need changing to implement this, as well as several API changes required to make this work neatly:

1. Data allocation and Python/numpy access Considering that the most common way to define the size of the padding regions (the read-only part of the data at the boundary) is through the Function-specific space/time_order parameter, I would argue that the shape taken by the grid should be interpreted as the "domain shape". That is to say that all functions sharing a Grid object agree on the same domain size, but may have different padding sizes.

The user data in f.data should most likely be trimmed to only present the "domain part" of the data, although the halo/padding/boundary part of the data also needs to be accessible to users. For this reason, I'm proposing to introduce f.data_domain and f.data_allocated, where f.data defaults to f.data_domain (naming, as always debatable).

2. Runtime argument derivation

Since the domain-based approach indexes relative to the "write" part of the data and loop, x_start/x_end variables can be expressed/expected with respect to the domain and should thus not include any offsets. However, the associated x_size needs to be treated carefully, since it is used for the data cassts in the C kernels. It thus needs to be in terms of "allocation", rather than "domain" and we should discuss what the conventions should be.

3. Loop bounds in the C kernel

This is the trickiest part, as we will need to be able to handle data with different padding/boundary sizes in a single loop. As an example, consider the following boundary sizes: a: 0, b: 1, c: 4 and a stencil a[x] = a[x] + (b[x-1]+b[x+1])/2 + (c[x-4]+c[+4])/2

This looks like this in memory:

a:  | ------------ |
b:  | - | ------------ | - |
c:  | ---- | ------------ | ---- |

My intuition on this is that the main loop counter for x in this case should be in terms of the domain ("write region"), so that arguments and loop boundaries all keep their natural meaning. How we "translate" or "normalize" the respective boundaries in the generated code is to be discussed.

FabioLuporini commented 6 years ago

OK, let me comment each of these three points.

1a)

Considering that the most common way to define the size of the padding regions [...] have different padding sizes.

Yes.

1b)

The user data in f.data should most likely be trimmed to only present the "domain part" [...] as always debatable).

Yes. Please, take a look at how I did that in YASK: https://github.com/opesci/devito/blob/master/devito/yask/wrappers.py#L173 We could use the new type Data that I'm introducing as part of #448 . This is a subclass of a numpy array. When the user calls data.with_halo, I think what we would have to do is to return a view of the same object with "enlarged boundaries". We probably have to generalise the logic indexing that we're gonna use for buffered dimensions. That is:

In fact, let me rephrase what I wrote above. When the user writes u.data.with_halo, he gets the "real" (full) array; when the user writes u.data, he gets a "shrunk" view. I really think what I did for YASK is analogous.

2)

Since the domain-based approach indexes relative to the "write" part of the data [...] conventions should be.

This should not be too difficult too, but it's very easy to get it buggy

3)

This is the trickiest part, as we will need to be able to handle data with different padding/boundary sizes in a single loop [...]

I think the loop bounds won't change, they will still be dictated by the "largest offsets". What have to change, probably, are the array indices. We know the padding size of each Function, right? Then, we might need a further Operator pass that basically applies an xreplace to the various Indexed to shift down/up the various indices. This pass should occur prior to loop construction; in fact, this should be executed somewhere at the very beginning, right after the stencils calculation. We can exploit this utility function to shift indices:

https://github.com/opesci/devito/blob/master/devito/symbolics/manipulation.py#L134


How do we do this? I think it's a matter of when, rather than how; the how should probably be a one day sprint..

FabioLuporini commented 6 years ago

@mlange05 I have a question about point 1. (How) Is this gonna impact the current examples, or even all Devito codes?

Also: do you think we should do anything special for SparseFunction ?

mlange05 commented 6 years ago

Yes, it's quite likely that a lot of things will break once we switch. The seismic stuff all hides behind the Model, so there is a single choke-point to fix, but in general this change is quite intrusive.

SparseFunction needs a general re-think, although in this context it should not require any deep changes, as the indirection entirely depends on the the coordinates of the points. As long as the coordinates are treated right, the sparse interpolation should just fall out.

FabioLuporini commented 6 years ago

A possible naming scheme:

FabioLuporini commented 6 years ago

we might also need:

FabioLuporini commented 6 years ago

@navjotk A quick question, just to be sure: will you still need the operator offsets (that #453 is now moving to the arguments engine) after the domain-addressing switch? I guess the answer is No.

So, we should also discuss this:

However, the associated x_size needs to be treated carefully, since it is used for the data cassts in the C kernels. It thus needs to be in terms of "allocation", rather than "domain" and we should discuss what the conventions should be.

@mlange05 this is because x_size will refer to the domain size, rather than the allocation size, correct? In theory, from now on, each function will need its own allocation size, such as m_x_size, m_y_size, ... u_x_size, u_y_size, ... and so on. Unless I'm missing something obvious, which could be the case, this leads to an explosion in the number of kernel parameters. We might want to start dropping in structs, rather than ints. This issue had already been noted here #387 . I'm worried I'm actually talking BS. What am I missing?

@mlange05 you said you might have had time to help on this stuff -- perhaps this thing above is the self-contained task you needed? just a random guess -- and forgive him if something higher priority as popped up meanwhile ! (@ggorman feel free to share your thoughts here)

mlange05 commented 6 years ago

Well, since we need to add in the offsets for each individual function symbol inside the loop anyway to account for the different order/padding offsets in each field (the loop will be over domain), we might as well hard-code the offsets into the data casts. This would prevent the pre-symbol-per-dim state-space explosion, I think (I might be missing something).

To illustrate, consider two Function objects f=Function(name='f', grid=grid, space_order=1) and g=Function(name='g', grid=grid, space_order=3) and dimensions (x, y). Then Operator(Eq(f, g)) would give something like (untested and only conceptually):

f[2*1+x_size, 2*1+y_size] = f_vec (...<rest of cast>)  # Allocation-based casts
g[2*3+x_size, 2*3+y_size] = g_vec (...<rest of cast>)

for(x=x_s; x<x_e; x++)  # Domain-based iterator
  for (y=y_s; y<y_e; y++)  # Domain-based iterator
    f[1+x, 1+y] = g[3+x, 3+y]  # Allocation-aligned data access

Does that make sense?

navjotk commented 6 years ago

Yes I believe we would still need the offsets that I've put in place.

FabioLuporini commented 6 years ago

Does that make sense?

I think it does

Yes I believe we would still need the offsets that I've put in place.

why/when?

FabioLuporini commented 6 years ago

why/when?

@navjotk reminder

FabioLuporini commented 6 years ago

why/when?

@navjotk reminder

FabioLuporini commented 6 years ago

@mlange05 you originally wrote:

However, the associated x_size needs to be treated carefully, since it is used for the data cassts in the C kernels. It thus needs to be in terms of "allocation", rather than "domain" and we should discuss what the conventions should be.

but then, in the code snippet above, you did something different. x_size looks like the domain size. This latter approach seems easier to me: there would be only one x_size.

mlange05 commented 6 years ago

Yes, I hadn't thought this through properly. Considering we might have functions with different allocation sizes in shared dimensions, I think using x_size as domain and hard-coding the individual difference (offset) in each cast would be the most sensible solution if we want to avoid a variable-name explosion.

FabioLuporini commented 6 years ago

@mlange05 how about non-affine index accesses, such as via sources and receivers positions? should they be added the halo+padding extent too? I guess so...?