devitocodes / devito

DSL and compiler framework for automated finite-differences and stencil computation
http://www.devitoproject.org
MIT License
561 stars 228 forks source link

notebook describing the clustering procedure in compiler. #705

Closed georgebisbas closed 4 years ago

CavalcanteLucas commented 5 years ago

Please, let me take this. Could someone please provide description?

FabioLuporini commented 5 years ago

This one is about explaining, with one or more running examples, how clustering (one of the early compilation passes) works. My paper also has a section about it, so that might be a starting point. We might have to touch upon data dependence analysis, as required by the groupby function (one of the most important clustering routines).

I suggest to start with one simple yet non-trivial example. It cannot be as trivial as Eq(f.forward, f+1). It should rather eventually create multiple loop nests, just like happens in one of the pde-based seismic examples (isotropic acoustic/tti/elastic/...). However, these are too complicated. So you might want to reverse-engineering how clustering works by trying to write an example which places equations in different loop nests.

Remember, the compiler operates with, conceptually, one simple goal in mind "how should the equations be scheduled into loops so that the information 'naturally' flows from one iteration to another?". Or, in other words, "how should the equations be scheduled into loops so that when we execute a certain loop iteration, all data that need to be accessed is up-to-date?" Please do keep this in mind while studying this stuff

CavalcanteLucas commented 5 years ago

Hey, I have some questions to share:

1) I couldn't make it to debug tests using IPython.embed(). Running tests with pytest will simple ignore it. Would you have any hints? How do you guys usually do that?

2) Following the code flow within operator.py one can find (currently at the line 89) the call for the function clusterize(expressions). This lead me to the call for detect_flow_directions(exprs) within (currently at line 237 of) /ir/clusters/algorithms.py. Then, to the call for as_tuple(exprs) within (currently at line 125 of) /ir/support/utils.py. In /tools/utils.py one finds the defintion for as_tuple at lines 19-41:

def as_tuple(item, type=None, length=None):
    """
    Force item to a tuple.

    Partly extracted from: https://github.com/OP2/PyOP2/.
    """
    # Empty list if we get passed None
    if item is None:
        t = ()
    elif isinstance(item, str):
        t = (item,)
    else:
        # Convert iterable to list...
        try:
            t = tuple(item)
        # ... or create a list of a single item
        except (TypeError, NotImplementedError):
            t = (item,) * (length or 1)
    if length and not len(t) == length:
        raise ValueError("Tuple needs to be of length %d" % length)
    if type and not all(isinstance(i, type) for i in t):
        raise TypeError("Items need to be of type %s" % type)
    return t

My question is: Could you please talk more about lines 36-40?

t = (item,) * (length or 1)
    if length and not len(t) == length:
        raise ValueError("Tuple needs to be of length %d" % length)
    if type and not all(isinstance(i, type) for i in t):
        raise TypeError("Items need to be of type %s" % type)

3) Within function definition EVAL in conftest.py; what is globals() (currently at line 265)?

FabioLuporini commented 5 years ago
1. I couldn't make it to **debug tests** using `IPython.embed()`. Running tests with `pytest` will simple ignore it. Would you have any hints? How do you guys usually do that?

add -s -- py.test ... -s ...

Could you please talk more about lines 36-40?

I don't quite remember what these lines do, but what's important here is what as_tuple does. Some examples:

as_tuple([]) -> ()
as_tuple(None) -> ()
as_tuple('aaa') -> ('aaa',)
as_tuple([1, 2, 3]) -> (1, 2, 3)

so it turns the argument into a reasonable tuple

Within function definition EVAL in conftest.py; what is globals() (currently at line 265)?

this is hopefully useful

CavalcanteLucas commented 5 years ago

Registering for future readers:

Some reading on Data Dependence Analysis is worth at this point.

I recomend the slides at: http://www.cs.arizona.edu/~collberg/Teaching/553/2011/Handouts/Handout-33.pdf and pages 1-10 of Loop and Data Transformations: A Tutorial, found at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.10

My next checkpoint is to compose* loop nests with:

It should also include loops with carried dependency among iterations.

(* see: https://github.com/opesci/devito/blob/master/tests/test_ir.py https://github.com/opesci/devito/blob/master/examples/compiler/02_iet.ipynb)

CavalcanteLucas commented 5 years ago

I've tried to directly build something like:

for (int i = 0; i <= 10; i += 1) { a = b + c; d = a + e; }

in order to watch some flow dependency. Although I realized it doesn't make sense trying to clusterize something like that, because that is already at the Iteration/Expression domain.

I think LoweredEq objects should be used instead. In that case, the question is: how might one figure which expressions would result in different loop nest configurations?

Edit: Examples from https://github.com/opesci/devito/blob/master/tests/test_ir.py#L382 can serve as reference.

FabioLuporini commented 5 years ago

yes, probably you gotta use LoweredEq. No trace of loops at this stage.

CavalcanteLucas commented 5 years ago

Question:

in the method __repr__ of Access class found in devito/ir/support/basic.py, what are those codes to define the mode variable?

\033[1;37;31mW\033[0m in case of write, and \033[1;37;32mR\033[0m in case of reading. Is it correct or is it any kind of typo?

FabioLuporini commented 5 years ago

colors :)

red for writes, green for reads

used for display in a terminal

CavalcanteLucas commented 5 years ago

Still in Access, at definition of __new__: could it be obj.function = indexed.function instead of obj.function = indexed.base.function?

CavalcanteLucas commented 5 years ago

In devito/ir/support/utils.py, at detect_flow_directions: mapper.get(k.parent, {Any}) means that the value related to the key k.parent should receive {Any} if there is not such a key.

CavalcanteLucas commented 5 years ago

In devito/operator.py:

One may notice that _specialize_exprs goes through detect_flow_directions, just as clusterize. In the first case, it is a "local pass", getting each of the expressions isolated. In the second case, it is a "global pass", considering all the expressions together.

CavalcanteLucas commented 5 years ago

is_Tensor is defined in devito/ir/equations/equation.py in terms of is_Indexed. Although, I can't find the definition of is_Indexed with git grep "def is_Indexed". What am I missing ?

FabioLuporini commented 5 years ago

Still in Access, at definition of __new__: could it be obj.function = indexed.function instead of obj.function = indexed.base.function?

yes

In devito/ir/support/utils.py, at detect_flow_directions: mapper.get(k.parent, {Any}) means that the value related to the key k.parent should receive {Any} if > there is not such a key.

yes, dict().get is widely used in python

In devito/operator.py: One may notice that _specialize_exprs goes through detect_flow_directions, just as clusterize. In the first case, it is a "local pass", getting each of the expressions isolated. In the second case, it is a "global pass", considering all the expressions together.

oh yes -- I realise only now that yesterday in the call I've repeated stuff that you already knew :) sorry

is_Tensor is defin d in devito/ir/equations/equation.py in terms of is_Indexed. Although, I can't find the definition of is_Indexed with git grep "def is_Indexed". What am I missing ?

it's inherited from SymPy directly

CavalcanteLucas commented 5 years ago

oh yes -- I realise only now that yesterday in the call I've repeated stuff that you already knew :) sorry

No, I didn't know that. I'm just registering what you told me!

it's inherited from SymPy directly

I see! Thanks

CavalcanteLucas commented 5 years ago

The following code has been built based on the function test_loop_wrapping from test_ir.py.

from devito import Eq, Grid, TimeFunction, Operator
from devito import configuration
configuration['openmp'] = 0

grid = Grid(shape=(3, 3, 3))

v = TimeFunction(name='v', grid=grid, time_order=4)
w = TimeFunction(name='w', grid=grid, time_order=4)

exprs = ['Eq(w.forward, w + 1)',
         'Eq(v.forward, w)']

c = enumerate(list(exprs))
for i, e in c:
    exprs[i] = eval(e)

op = Operator(exprs, dle='noop')
from devito.tools import pprint
pprint(op)

My goal is to generate expressions that live in different iteration nests.

I expected it to be the case as it this particular combination is tested with wrappable being asserted as False. Although, this is what I see:

<Callable Kernel>
  <List (0, 3, 0)>

    <ArrayCast>
    <ArrayCast>
    <List (0, 1, 0)>

      <[affine,sequential] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
        <TimedList (2, 1, 2)>
          <C.Statement struct timeval start_section0, end_section0;>
          <C.Statement gettimeofday(&start_section0, NULL);>
          <Section (1)>

            <[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
              <[affine,parallel] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
                <[affine,parallel,vector-dim] Iteration z::z::(z_m, z_M, 1)::(0, 0)>
                  <ExpressionBundle (2)>

                    <Expression w[t3, x + 1, y + 1, z + 1] = w[t2, x + 1, y + 1, z + 1] + 1>
                    <Expression v[t3, x + 1, y + 1, z + 1] = w[t2, x + 1, y + 1, z + 1]>

          <C.Statement gettimeofday(&end_section0, NULL);>
          <C.Statement timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;>

Did I miss something?

FabioLuporini commented 5 years ago

why do you think they should live in different loop nests?

CavalcanteLucas commented 5 years ago

because of this test:

    @pytest.mark.parametrize('exprs,wrappable', [
        # Easy: wrappable
        (['Eq(u.forward, u + 1)'], True),
        # Easy: wrappable
        (['Eq(w.forward, w + 1)'], True),
        # Not wrappable, as we're accessing w's back in a subsequent equation
        (['Eq(w.forward, w + 1)', 'Eq(v.forward, w)'], False),
        # Wrappable, but need to touch multiple indices with different modulos
        (['Eq(w.forward, u + w + 1)'], True),
        # Wrappable as the back timeslot is accessed only once, even though
        # later equations are writing again to w.forward
        (['Eq(w.forward, w + 1)', 'Eq(w.forward, w.forward + 2)'], True),
        # Not wrappable as the front is written before the back timeslot could be read
        (['Eq(w.forward, w + 1)', 'Eq(u.forward, u + w + 2)'], False),
    ])
    def test_loop_wrapping(self, exprs, wrappable):
        """Tests detection of WRAPPABLE property."""
        grid = Grid(shape=(3, 3, 3))

        u = TimeFunction(name='u', grid=grid)  # noqa
        v = TimeFunction(name='v', grid=grid, time_order=4)  # noqa
        w = TimeFunction(name='w', grid=grid, time_order=4)  # noqa

        # List comprehension would need explicit locals/globals mappings to eval
        for i, e in enumerate(list(exprs)):
            exprs[i] = eval(e)

        op = Operator(exprs, dle='speculative')

        iters = FindNodes(Iteration).visit(op)

        # Dependence analysis checks
        time_iter = [i for i in iters if i.dim.is_Time]
        assert len(time_iter) == 1
        time_iter = time_iter[0]
        if wrappable:
            assert time_iter.is_Wrappable
        assert all(not i.is_Wrappable for i in iters if i is not time_iter)

in this part:

        # Not wrappable, as we're accessing w's back in a subsequent equation
        (['Eq(w.forward, w + 1)', 'Eq(v.forward, w)'], False),
CavalcanteLucas commented 5 years ago

I found that this is not wrappable (yet by trial-and-error):

exprs = ['Eq(w.forward, w)',
         'Eq(w, w.forward)']

Generating this:

<Callable Kernel>
  <List (0, 2, 0)>

    <ArrayCast>
    <List (0, 2, 0)>

      <[affine,sequential,wrappable] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
        <TimedList (2, 1, 2)>
          <C.Statement struct timeval start_section0, end_section0;>
          <C.Statement gettimeofday(&start_section0, NULL);>
          <Section (1)>

            <[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
              <[affine,parallel] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
                <[affine,parallel,vector-dim] Iteration z::z::(z_m, z_M, 1)::(0, 0)>
                  <ExpressionBundle (1)>

                    <Expression w[t1, x + 1, y + 1, z + 1] = w[t0, x + 1, y + 1, z + 1]>

          <C.Statement gettimeofday(&end_section0, NULL);>
          <C.Statement timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;>
      <[affine,sequential,wrappable] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
        <TimedList (2, 1, 2)>
          <C.Statement struct timeval start_section1, end_section1;>
          <C.Statement gettimeofday(&start_section1, NULL);>
          <Section (1)>

            <[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
              <[affine,parallel] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
                <[affine,parallel,vector-dim] Iteration z::z::(z_m, z_M, 1)::(0, 0)>
                  <ExpressionBundle (1)>

                    <Expression w[t0, x + 1, y + 1, z + 1] = w[t1, x + 1, y + 1, z + 1]>

          <C.Statement gettimeofday(&end_section1, NULL);>
          <C.Statement timers->section1 += (double)(end_section1.tv_sec-start_section1.tv_sec)+(double)(end_section1.tv_usec-start_section1.tv_usec)/1000000;>
FabioLuporini commented 5 years ago

the last one you pasted is different because you you w all over the place, the others had v (v.forward). This makes a huge difference (why?)

CavalcanteLucas commented 5 years ago

well, it must be because of data dependency. although i thought that there should be some dependency in a case like the following as well (which is not true):

exprs = ['Eq(w.forward, v)',
         'Eq(v, w.forward)']

i think my next step will be to debug the compilation to figure that out

CavalcanteLucas commented 5 years ago

for future readers: check the function test_consistency_anti_dependences at test_operator.py for some references on how to check for anti dependences that end up generating multi loop nests, rather than a single loop nest enclosing all of the equations.

georgebisbas commented 4 years ago

@CavalcanteLucas was there any branch you were working on this?

FabioLuporini commented 4 years ago

it'd be decrepit anyway because of the changes in the last few months

FabioLuporini commented 4 years ago

closing as superseded by #1182