chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

When to initialize a task-private variable #9008

Open vasslitvinov opened 6 years ago

vasslitvinov commented 6 years ago

Background: task-private variables #6325 .

This issue: when should a task-private variable be initialized? We discuss these options:

(A) "inside-task": when its task STARTS RUNNING, or

(B) "before-task": when its task IS CREATED.

Background: consider:

record R { var x: int; }
var myR = new R(0);

while myR.x < 10 {
  begin with (in myR) do writeln(myR.x);
  myR.x += 1;
}

Here, each task has its own shadow 'myR', initialized as a copy of the outer 'myR'. The point is to highlight the behavior difference between two approaches.

(a) The copy occurs when the shadow variable's task STARTS RUNNING. At that time, depending on the timing of task startup, the outer myR can have any digit in the x field. Correspondingly, the shadow myR's x field is non-deterministic.

(b) The copy occurs when the task IS CREATED. In this case, the outer myR.x is fixed for each iteration of the 'while', so the shadow myR.x is deterministic for each task.

The desired semantics is (b). This is what we implement as well, see #8416, $8417.

Now, turn to task-private variables. In this example:

record R { var x: int; }
var myR = new R(0);

while myR.x < 10 {
  begin with (var TPV = myR) do writeln(TPV.x);
  myR.x += 1;
}

(A) leads to non-deterministic TPV.x in each task. (B) gives deterministic TPV.x.

We can choose either scenario, with the following arguments:

(A) "inside-task" - initialize TPV when its task STARTS RUNNING

Pro: TPV is a task-private variable. Its life is entirely within a task. So it is initialized and destructed while that task is running.

Pro: using a task-private variable is another way of writing:

begin {
  var TPV = myR;
  writeln(TPV.x);
}

In this case, 'myR' is a shadow variable with the default intent. Since it is a record, it is 'const ref'. Therefore it tracks the changes to the outer variable, so non-determinism is natural / inherent in the default intent for 'myR'.

The same non-determinism occurs here:

// assume 'myR' is modified concurrently with the 'forall'
forall ... {
  writeln(myR.x); // prints the value of myR.x at the time writeln() is executed
}

To make the 'begin' and 'forall' examples "deterministic", the user needs to use an 'in' intent explicitly:

begin with (in myR, var TPV = myR) do writeln(TPV.x);

// or, equivalently:

begin with (in myR) {
  var TPV = myR;
  writeln(TPV.x);
}

// ditto for forall
forall ... with (in myR, var TPV = myR) {
  writeln(TPV.x); // prints the value of myR.x at entry to forall
}

(B) "before-task" - initialize TPV when its task IS CREATED

Pro: avoids the non-determinism in the above example.

Pro: matches the behavior of 'in' intent.

Access to no-longer-existing variables

[Added on 3/28]

This issue is distinct from non-determinism and is specific to task intents for 'begin' constructs. In the original example:

record R { var x: int; }
var myR = new R(0);

while myR.x < 10 {
  begin with (var TPV = myR) do writeln(TPV.x);
  myR.x += 1;
}

(A) may result in referencing the outer 'myR' when it has gone out of scope already, making it undefined behavior. (B) guarantees against that.

In this regard, (A) is consistent with referencing 'myR' inside the 'begin' body:

begin writeln(myR);

where 'myR' is a const ref to the outer 'myR', with the same danger of being undefined by the time writeln() is executing.

vasslitvinov commented 6 years ago

@bradcray would you weigh in?

@mppf fyi

bradcray commented 6 years ago

Option (b) seems more obviously intuitive to me and seems more useful / less surprising. What are its downsides?

vasslitvinov commented 6 years ago

@bradcray - in my view the downsides are:

vasslitvinov commented 6 years ago

@bradcray - one more. To pull in BenH's example in #6325:

forall ... with (ref mySlice = globalArray.localSlice()) {
  .... use mySlice - more efficient than using globalArray directly ...
}

Do we want 'mySlice' for a given task to be computed within that task or by a parent task? What if the parent task runs on a different locale than the current task?

@benharsh, @mppf - what do you think should happen here?

benharsh commented 6 years ago

When I wrote that issue I was thinking that globalArray.localSlice() would be implemented such that it would return a thing that aliases the data from globalArray on here. With that implementation in mind, I was thinking that globalArray.localSlice() would be called in an Option-A style.

Perhaps a simpler example:

use BlockDist;
proc foo() {
  return here.id;
}
var D = {1..100} dmapped Block({1..100});
var A : [D] int;
forall a in A with (var id = foo()) {
  writlen(a.locale.id == id);
}

If we run this program with multiple locales, what should the output be? Option-A: Some true, some false (id == 0) Option-B: All true Before-Task: Some true, some false (id is zero) Inside-Task: All true

benharsh commented 6 years ago

Do we have any text in the spec that describes the semantics of a begin-on or coforall-on? Specifically, does the task start on one locale and get moved, or does it begin on the destination locale?

bradcray commented 6 years ago

Here, I'm only commenting on comments up to the one bringing in Ben's local slice example—I'll take up the question of the slicing / here.id examples in a follow-up comment:

The semantics should not favor the behavior of the 'in' intent over the 'ref' intent.

Can you clarify what you mean by this? I'm not understanding it.

Generally speaking, I'd suggest focusing in this issue more on forall cases over begin cases when wrestling with semantics since (a) they're more common and (b) I think that the user has a greater number of fallbacks for getting the behavior they want with begins, whereas manually working around intent semantics for forall loops typically requires rewriting the forall as the component coforalls +for loops which makes the abstraction less valuable.

bradcray commented 6 years ago

With respect to Ben's slicing and here.id examples, one observation is that, for the Block distribution, I think these idioms would typically work as desired even under option (b), owing to the fact that Block's parallel iterators are implemented using a nested coforall. However, I don't think that that's particularly satisfying as I'll explain below. Namely, if a Block parallel iterator generally has the structure:

coforall loc in Locales do
  on loc do
    coforall tid in 0..#numTasks do

then a loop like:

forall a in A with (var id = foo()) {

would arguably naively get converted into:

coforall loc in Locales with (var id = foo()) do
  on loc do
    coforall tid in 0..#numTasks (var id = foo()) do

such that while the outer loop's id is 0 for all tasks, the inner loop's id would be as expected (and the compiler could even eliminate the outer ids, detecting that they won't be used/visible).

I call this potentially unsatisfying because presumably we'd want this behavior for any distributed forall loop regardless of whether it used additional tasks on each locale or not. As an example, if we eventually implement the "localerator" iterator such that we could write:

forall loc in Locales

we'd presumably want a similar task-private id variable to be correct on all locales, yet its unlikely that such an iterator would have any inner tasking calls as Block does. So I don't think we should rely on Blocks nested tasking as our solution to this issue.

At the same time, I'm not ready to adopt the potential nondeterminism/confusion of option (a) over this, which makes me think about Ben's comment:

Do we have any text in the spec that describes the semantics of a begin-on or coforall-on? Specifically, does the task start on one locale and get moved, or does it begin on the destination locale?

I think we haven't ever formally defined this, considering the current "just create the remote task" behavior as being an optimization. But maybe a way to get the determinism and the hoped-for behavior in these examples would be to more formally define these "task+on" combinations such that we could also define that the task-private variables are set up on the right locale? This seems related to explorations @ronawho's been doing w.r.t. vectorizing remote coforall launches, so I'm tagging him on this thread as well.

vasslitvinov commented 6 years ago

Zooming in on the "nondeterminism" under choice (A) ...

It shows up as a potentially-non-deterministic initial value of a task-private variable:

var myArray ...;
var myRecord ...;

forall ... with (var localArray = myArray, var localRecord = myRecord) {
  ...
}

It also shows up on current master as a potentially-non-deterministic value of the shadow variable accessed within the loop body:

var myArray ...;
var myRecord ...;

forall ... {    // using the default forall intent
  writeln(myArray);
  writeln(myRecord);
}

Are these two cases really different?

If the user needs determinism, they would use a '[const] in' intent equally in both cases:

var myArray ...;
var myRecord ...;

// with task-private variables
forall ... with (in myArray,  var localArray = myArray,
                 in myRecord, var localRecord = myRecord)
{
  ...
}

// with accesses in loop body
forall ... with (in myArray,
                 in myRecord)
{
  writeln(myArray);
  writeln(myRecord);
}

My assumption is that the initializer of a task-private variable, ex. 'myArray' in "var localArray = myArray", is evaluated using the shadow variables rather than the outer variables directly.

Determinism will happen automatically in both scenarios when the default intent for the variable is '[const] in'.

Do we have any text in the spec that describes the semantics of a begin-on or coforall-on?

Nice catch, Ben! I had thought about our current implementation as just an optimization, like Brad.

I support defining a special case for task+on combinations so that the task-private variables are set up on the right locale under (A). Thumbs up?

mppf commented 6 years ago

Regarding @benharsh's example:

Perhaps a simpler example:

use BlockDist;
proc foo() {
  return here.id;
}
var D = {1..100} dmapped Block({1..100});
var A : [D] int;
forall a in A with (var id = foo()) {
  writlen(a.locale.id == id);
}

If we run this program with multiple locales, what should the output be? Option-A: Some true, some false (id == 0) Option-B: All true

I'm probably missing something but it seems to me that the options are swapped here - Option A gives all true but Option B would have id=0 everywhere, right? Maybe we need to give the options easy-to-remember-names, like A="on task start" vs B="on task create".

Regarding Vass' comment:

I support defining a special case for task+on combinations so that the task-private variables are set up on the right locale under (A). Thumbs up?

Did you mean that if we choose Option A, we should define it to allocate on "the right locale" under coforall+on? Or, are you also trying to express a preference between Option A and Option B at this point? Does such a rule resolve the issue under Option B for Ben's example (above), where we'd presumably like it to output all true?

benharsh commented 6 years ago

I think Michael is right and that I swapped the Options. I updated my example to say "Before-Task" and "Inside-Task".

vasslitvinov commented 6 years ago

I updated the main text with "inside-task" and "before-task". These terms are not as precise as Michael's suggestions IMO, however they are more catchy.

vasslitvinov commented 6 years ago

I support defining a special case for task+on combinations so that the task-private variables are set up on the right locale under (A).

If we choose (A) "inside-task", the special-case rule would say ex. "For a task+on combination, the task is created - and the on-clause is evaluated - on the current locale. The task starts running - and its task-private variables, if any, are initialized - on the locale of the on-clause."

If we choose (B) "before-task", what do we want to happen at run time?

We have to go to each remote locale to initialize our task-private variables before any of the task bodies start executing. Because once a task starts executing, it can modify the outer variable - myArray or myRecord in my example. If we initialize another task's task-private variable after that, we are back to the non-determinism of (A) "inside-task", losing the "selling point" of (B).

Therefore we would have to delay executing any of the tasks until after we have gone to all locales and initialized all their task-private variables. This can really hurt performance.

How else can we design what happens upon (B) "before-task" without introducing non-determinism?

vasslitvinov commented 6 years ago

For (B) "before-task" we have to introduce additional overhead even on a single locale, if we want to guarantee against non-determinism. Consider again:

var myArray ...; // perhaps a global; updated concurrently with the loop below

forall ... with (var localArray = myArray) {
  ...
}

Consider a parallel iterator with a coforall. As the coforall is creating tasks, it is copying myArray into each task's localArray. This is still non-deterministic because myArray may change between copying for the first and the second task. So we HAVE to initialize task-private variables from a copy of myArray instead, which we need to set aside in advance.

This is what we do for in intents, for the same reason. Which - answering @bradcray's question above - motivated me to say:

The semantics should not favor the behavior of the 'in' intent over the 'ref' intent.

Is this really how we want our task-private variables to work?

vasslitvinov commented 6 years ago

Another concern with the above is that the compiler adds a copy of myArray when the user does not ask for it. I think we should avoid this.

A similar scenario occurs when the user picks just a couple of elements from myArray into each task. For example:

forall ... with (var localData = pickSomeLocalData(myArray)) {
  ...
}

Again, the compiler needs to introduce an implicit copy of myArray to avoid non-determinism, without the user asking for it.

mppf commented 6 years ago

This is still non-deterministic because myArray may change between copying for the first and the second task.

That's true, but I wouldn't be worried about that one, because it would mean the program has a race condition. I thought that even the in intent for coforalls would copy once per task, in a loop, while creating the tasks (rather than making 1 copy at start of coforall and then copying from that into each task-specific copy).

cassella commented 6 years ago

If we choose (A) "inside-task", the special-case rule would say ex. "For a task+on combination, the task is created - and the on-clause is evaluated - on the current locale. The task starts running - and its task-private variables, if any, are initialized - on the locale of the on-clause."

That would mean that the on-clause couldn't use a task-private variable to choose the target locale?

coforall i in 1..N with (var myLoc = Locales[i%numLocales]) {
  on myLoc { ... }
}
vasslitvinov commented 6 years ago

That would mean that the on-clause couldn't use a task-private variable to choose the target locale?

Good point!

I do not see that this affects our choice, however. The user does not have to use a task-private variable to compute the target locale...

coforall i in 1..N with (var myLoc = here) {
  on Locales[i%numLocales] { ... }
}
mppf commented 4 years ago

@vasslitvinov - does this issue still need to be open? What's the current status? What are the next steps?

vasslitvinov commented 4 years ago

The next step is to reach consensus on the desired semantics.

bradcray commented 4 years ago

@vasslitvinov: Can you be more specific? Since this issue has opened, the "before task" approach has seemed more and more obviously correct to me, I think it's what's implemented, and I'm not aware of any serious issues with it. With a very quick re-scan of this issue, my sense is that that reflects the general consensus as well. What do you view as being required to resolve it?

vasslitvinov commented 4 years ago

We have not consciously reconciled these interdependent issues:

Also, what is currently implemented is "inside task", which we may want to change.