chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

Resolution rules for parallel loops #24796

Open vasslitvinov opened 5 months ago

vasslitvinov commented 5 months ago

Here are some clarifications on the CURRENT resolution rules to go with our parallel iterators primer. However, let us start with design questions relating to the current rules:

Is the serial iterator required when its parallel counterpart is invoked in a parallel loop?

For example, given:

forall i in myIter() {...}

is it sufficient to declare:

iter myIter(param tag) where tag == iterKind.standalone { ... }

or is it an error when this serial iterator is absent:

iter myIter() { ... }

Currently the serial iterator is required in some cases and not required in others.

Other design questions include:

When both the serial and the parallel iterators are required or otherwise looked at, how closely should their formal arguments match? For example, can one of the family have an in-intent int(8) formal and another a ref-intent int(64) formal? Can the positionally-same formals have different names?

When invoking a leader and a follower, should the actuals to the leader invocation be cached and passed to all the follower invocations? Currently they are cached.

When the first iterable of a zippered [ ] loop resolves to a leader+follower, are the remaining iterables required to be followers? Or when one of the followers is not available, should all iterables fall back to the serial iterators? If they do, should there be a warning?

A potential simple style for defining the rules

We could define the semantics using the rewrite rules to make some aspects of the current implementation explicit. For example:

forall idx in myIter() { body; }
// is equivalent to
for idx in myIter(tag=iterKind.standalone) { body; }
// if the standalone iterator exists
forall (i1, i2) in zip(iter1(), iter2()) { body; }
// is equivalent to
for followMe in iter1(tag=iterKind.leader) {
  for (i1, i2) in zip(iter1(tag=iterKind.follower, followThis=followMe),
                      iter2(tag=iterKind.follower, followThis=followMe))
    { body; }
// plus additional rules for the actuals, if any

Some current rules

Given

forall tup in zip(foo(), foo()) do body;

the compiler resolves foo(tag == leader) and two times foo(tag == follower, followThis). These obviously must resolve to iterators, not procs. It might be OK for any of these to resolve to a proc marked with pragma "fn returns iterator".

The implementation expects the parallel iterators to be DECLARED with tag and (for followers) followThis formals. The users are not supposed to pass tag and followThis actual arguments explicitly.

When resolving the parallel iterators for a given parallel loop, the compiler implicitly generates the call by adding tag and (for followers) followThis as NAMED actuals, then the normal resolution rules apply. This means that the parallel iterators must name their formals exactly that.

Given

forall idx in myIter(a,b) do body;

the compiler tries first to resolve myIter(a, b, tag=iterKind.standalone). If this does not resolve, it tries myIter(a, b, tag=iterKind.leader); assuming success it then attempts to resolve myIter(a, b, tag=iterKind.follower, followThis=a value of the yielded type of the leader).

A follower iterator may have a generic followThis formal, a concrete followThis formal, or even multiple overloads expecting different types of followThis. The follower declaration(s) determine what leader yielded types this iterator can "follow."

Cf. for a serial for or foreach loop, no additional actuals are added to the iterator calls, whether it is a zippered loop or not.

Back to parallel iterators, generally the user is not expected to pass tag and/or followThis named actuals explicitly. If they do, it may lead to resolution errors, for example due to duplicate named tag actuals.

EXCEPTION: calling an iterator with a tag=iterKind.standalone or tag=iterKind.leader actual in a FOR loop turns this for-loop into a parallel loop over the corresponding iterator. For example, the current implementation uses such for-loops to forward standalone and leader iterators over an array to the iterators defined by its distribution.

How to support intentionally-order-dependent forall loops?

This comes up, for example, in “Run Stencil Run” / diamond tiling computations. While it probably still works, the spec says that it’s not a legal program, because the forall loop syntax indicates that the loop body is order-independent, and it is not in these cases.

We decided we needed a separate flavor of forall loops to support this scenario.

The part that remains a wrinkle is that a forall loop is lowered by combining the loop body and the iterator implementations. The follower iterator implementation might not be order-independent, and in that case the loop is not vectorizable. Follower iterators indicate their order-independence by using foreach instead of for. We have some heuristic optimization that makes simple iterator forwarding cases not need foreach.

vasslitvinov commented 4 months ago

A somewhat-related FAQ: why do we have _iteratorClass and _iteratorRecord and how are they different?

My mental model is that during execution an IR (iterator record) is created initially, containing the arguments that the iterator function was invoked with. Then it is passed to getIterator() to create an IC (iterator class), which holds the state that the iterator needs to retain across its yield statements. The IR is not used while the iterator is running. The IC is deinitialized once the iterator is done, when the loop body executes break, throws, etc.