Closed georgebisbas closed 4 years ago
This one is about explaining, with one or more running examples, how clustering (one of the early compilation passes) works. My paper also has a section about it, so that might be a starting point. We might have to touch upon data dependence analysis, as required by the groupby
function (one of the most important clustering
routines).
I suggest to start with one simple yet non-trivial example. It cannot be as trivial as Eq(f.forward, f+1)
. It should rather eventually create multiple loop nests, just like happens in one of the pde-based seismic examples (isotropic acoustic/tti/elastic/...). However, these are too complicated. So you might want to reverse-engineering how clustering works by trying to write an example which places equations in different loop nests.
Remember, the compiler operates with, conceptually, one simple goal in mind "how should the equations be scheduled into loops so that the information 'naturally' flows from one iteration to another?". Or, in other words, "how should the equations be scheduled into loops so that when we execute a certain loop iteration, all data that need to be accessed is up-to-date?" Please do keep this in mind while studying this stuff
Hey, I have some questions to share:
1) I couldn't make it to debug tests using IPython.embed()
. Running tests with pytest
will simple ignore it. Would you have any hints? How do you guys usually do that?
2) Following the code flow within operator.py
one can find (currently at the line 89) the call for the function clusterize(expressions)
.
This lead me to the call for detect_flow_directions(exprs)
within (currently at line 237 of) /ir/clusters/algorithms.py
.
Then, to the call for as_tuple(exprs)
within (currently at line 125 of) /ir/support/utils.py
.
In /tools/utils.py
one finds the defintion for as_tuple
at lines 19-41:
def as_tuple(item, type=None, length=None):
"""
Force item to a tuple.
Partly extracted from: https://github.com/OP2/PyOP2/.
"""
# Empty list if we get passed None
if item is None:
t = ()
elif isinstance(item, str):
t = (item,)
else:
# Convert iterable to list...
try:
t = tuple(item)
# ... or create a list of a single item
except (TypeError, NotImplementedError):
t = (item,) * (length or 1)
if length and not len(t) == length:
raise ValueError("Tuple needs to be of length %d" % length)
if type and not all(isinstance(i, type) for i in t):
raise TypeError("Items need to be of type %s" % type)
return t
My question is: Could you please talk more about lines 36-40?
t = (item,) * (length or 1)
if length and not len(t) == length:
raise ValueError("Tuple needs to be of length %d" % length)
if type and not all(isinstance(i, type) for i in t):
raise TypeError("Items need to be of type %s" % type)
3) Within function definition EVAL
in conftest.py
; what is globals()
(currently at line 265)?
1. I couldn't make it to **debug tests** using `IPython.embed()`. Running tests with `pytest` will simple ignore it. Would you have any hints? How do you guys usually do that?
add -s
-- py.test ... -s ...
Could you please talk more about lines 36-40?
I don't quite remember what these lines do, but what's important here is what as_tuple
does. Some examples:
as_tuple([]) -> ()
as_tuple(None) -> ()
as_tuple('aaa') -> ('aaa',)
as_tuple([1, 2, 3]) -> (1, 2, 3)
so it turns the argument into a reasonable tuple
Within function definition EVAL in conftest.py; what is globals() (currently at line 265)?
this is hopefully useful
Registering for future readers:
Some reading on Data Dependence Analysis is worth at this point.
I recomend the slides at:
http://www.cs.arizona.edu/~collberg/Teaching/553/2011/Handouts/Handout-33.pdf
and pages 1-10 of Loop and Data Transformations: A Tutorial
, found at:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.10
My next checkpoint is to compose* loop nests with:
It should also include loops with carried dependency among iterations.
(* see: https://github.com/opesci/devito/blob/master/tests/test_ir.py https://github.com/opesci/devito/blob/master/examples/compiler/02_iet.ipynb)
I've tried to directly build something like:
for (int i = 0; i <= 10; i += 1) { a = b + c; d = a + e; }
in order to watch some flow dependency.
Although I realized it doesn't make sense trying to clusterize
something like that, because that is already at the Iteration/Expression domain.
I think LoweredEq
objects should be used instead.
In that case, the question is: how might one figure which expressions would result in different loop nest configurations?
Edit: Examples from https://github.com/opesci/devito/blob/master/tests/test_ir.py#L382 can serve as reference.
yes, probably you gotta use LoweredEq. No trace of loops at this stage.
Question:
in the method __repr__
of Access
class found in devito/ir/support/basic.py
, what are those codes to define the mode
variable?
\033[1;37;31mW\033[0m
in case of write, and \033[1;37;32mR\033[0m
in case of reading.
Is it correct or is it any kind of typo?
colors :)
red for writes, green for reads
used for display in a terminal
Still in Access
, at definition of __new__
:
could it be obj.function = indexed.function
instead of obj.function = indexed.base.function
?
In devito/ir/support/utils.py
, at detect_flow_directions
:
mapper.get(k.parent, {Any})
means that the value related to the key k.parent
should receive {Any}
if there is not such a key.
In devito/operator.py
:
One may notice that _specialize_exprs
goes through detect_flow_directions
, just as clusterize
.
In the first case, it is a "local pass", getting each of the expressions isolated.
In the second case, it is a "global pass", considering all the expressions together.
is_Tensor
is defined in devito/ir/equations/equation.py
in terms of is_Indexed
.
Although, I can't find the definition of is_Indexed
with git grep "def is_Indexed"
.
What am I missing ?
Still in
Access
, at definition of__new__
: could it beobj.function = indexed.function
instead ofobj.function = indexed.base.function
?
yes
In devito/ir/support/utils.py, at detect_flow_directions: mapper.get(k.parent, {Any}) means that the value related to the key k.parent should receive {Any} if > there is not such a key.
yes, dict().get is widely used in python
In devito/operator.py: One may notice that _specialize_exprs goes through detect_flow_directions, just as clusterize. In the first case, it is a "local pass", getting each of the expressions isolated. In the second case, it is a "global pass", considering all the expressions together.
oh yes -- I realise only now that yesterday in the call I've repeated stuff that you already knew :) sorry
is_Tensor is defin d in devito/ir/equations/equation.py in terms of is_Indexed. Although, I can't find the definition of is_Indexed with git grep "def is_Indexed". What am I missing ?
it's inherited from SymPy directly
oh yes -- I realise only now that yesterday in the call I've repeated stuff that you already knew :) sorry
No, I didn't know that. I'm just registering what you told me!
it's inherited from SymPy directly
I see! Thanks
The following code has been built based on the function test_loop_wrapping
from test_ir.py
.
from devito import Eq, Grid, TimeFunction, Operator
from devito import configuration
configuration['openmp'] = 0
grid = Grid(shape=(3, 3, 3))
v = TimeFunction(name='v', grid=grid, time_order=4)
w = TimeFunction(name='w', grid=grid, time_order=4)
exprs = ['Eq(w.forward, w + 1)',
'Eq(v.forward, w)']
c = enumerate(list(exprs))
for i, e in c:
exprs[i] = eval(e)
op = Operator(exprs, dle='noop')
from devito.tools import pprint
pprint(op)
My goal is to generate expressions that live in different iteration nests.
I expected it to be the case as it this particular combination is tested with wrappable
being asserted as False
. Although, this is what I see:
<Callable Kernel>
<List (0, 3, 0)>
<ArrayCast>
<ArrayCast>
<List (0, 1, 0)>
<[affine,sequential] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
<TimedList (2, 1, 2)>
<C.Statement struct timeval start_section0, end_section0;>
<C.Statement gettimeofday(&start_section0, NULL);>
<Section (1)>
<[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
<[affine,parallel] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
<[affine,parallel,vector-dim] Iteration z::z::(z_m, z_M, 1)::(0, 0)>
<ExpressionBundle (2)>
<Expression w[t3, x + 1, y + 1, z + 1] = w[t2, x + 1, y + 1, z + 1] + 1>
<Expression v[t3, x + 1, y + 1, z + 1] = w[t2, x + 1, y + 1, z + 1]>
<C.Statement gettimeofday(&end_section0, NULL);>
<C.Statement timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;>
Did I miss something?
why do you think they should live in different loop nests?
because of this test:
@pytest.mark.parametrize('exprs,wrappable', [
# Easy: wrappable
(['Eq(u.forward, u + 1)'], True),
# Easy: wrappable
(['Eq(w.forward, w + 1)'], True),
# Not wrappable, as we're accessing w's back in a subsequent equation
(['Eq(w.forward, w + 1)', 'Eq(v.forward, w)'], False),
# Wrappable, but need to touch multiple indices with different modulos
(['Eq(w.forward, u + w + 1)'], True),
# Wrappable as the back timeslot is accessed only once, even though
# later equations are writing again to w.forward
(['Eq(w.forward, w + 1)', 'Eq(w.forward, w.forward + 2)'], True),
# Not wrappable as the front is written before the back timeslot could be read
(['Eq(w.forward, w + 1)', 'Eq(u.forward, u + w + 2)'], False),
])
def test_loop_wrapping(self, exprs, wrappable):
"""Tests detection of WRAPPABLE property."""
grid = Grid(shape=(3, 3, 3))
u = TimeFunction(name='u', grid=grid) # noqa
v = TimeFunction(name='v', grid=grid, time_order=4) # noqa
w = TimeFunction(name='w', grid=grid, time_order=4) # noqa
# List comprehension would need explicit locals/globals mappings to eval
for i, e in enumerate(list(exprs)):
exprs[i] = eval(e)
op = Operator(exprs, dle='speculative')
iters = FindNodes(Iteration).visit(op)
# Dependence analysis checks
time_iter = [i for i in iters if i.dim.is_Time]
assert len(time_iter) == 1
time_iter = time_iter[0]
if wrappable:
assert time_iter.is_Wrappable
assert all(not i.is_Wrappable for i in iters if i is not time_iter)
in this part:
# Not wrappable, as we're accessing w's back in a subsequent equation
(['Eq(w.forward, w + 1)', 'Eq(v.forward, w)'], False),
I found that this is not wrappable (yet by trial-and-error):
exprs = ['Eq(w.forward, w)',
'Eq(w, w.forward)']
Generating this:
<Callable Kernel>
<List (0, 2, 0)>
<ArrayCast>
<List (0, 2, 0)>
<[affine,sequential,wrappable] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
<TimedList (2, 1, 2)>
<C.Statement struct timeval start_section0, end_section0;>
<C.Statement gettimeofday(&start_section0, NULL);>
<Section (1)>
<[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
<[affine,parallel] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
<[affine,parallel,vector-dim] Iteration z::z::(z_m, z_M, 1)::(0, 0)>
<ExpressionBundle (1)>
<Expression w[t1, x + 1, y + 1, z + 1] = w[t0, x + 1, y + 1, z + 1]>
<C.Statement gettimeofday(&end_section0, NULL);>
<C.Statement timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;>
<[affine,sequential,wrappable] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
<TimedList (2, 1, 2)>
<C.Statement struct timeval start_section1, end_section1;>
<C.Statement gettimeofday(&start_section1, NULL);>
<Section (1)>
<[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
<[affine,parallel] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
<[affine,parallel,vector-dim] Iteration z::z::(z_m, z_M, 1)::(0, 0)>
<ExpressionBundle (1)>
<Expression w[t0, x + 1, y + 1, z + 1] = w[t1, x + 1, y + 1, z + 1]>
<C.Statement gettimeofday(&end_section1, NULL);>
<C.Statement timers->section1 += (double)(end_section1.tv_sec-start_section1.tv_sec)+(double)(end_section1.tv_usec-start_section1.tv_usec)/1000000;>
the last one you pasted is different because you you w
all over the place, the others had v
(v.forward
). This makes a huge difference (why?)
well, it must be because of data dependency. although i thought that there should be some dependency in a case like the following as well (which is not true):
exprs = ['Eq(w.forward, v)',
'Eq(v, w.forward)']
i think my next step will be to debug the compilation to figure that out
for future readers: check the function test_consistency_anti_dependences
at test_operator.py
for some references on how to check for anti dependences that end up generating multi loop nests, rather than a single loop nest enclosing all of the equations.
@CavalcanteLucas was there any branch you were working on this?
it'd be decrepit anyway because of the changes in the last few months
closing as superseded by #1182
Please, let me take this. Could someone please provide description?