chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

Allow user to define intent of variables in on-clauses #9936

Open LouisJenkinsCS opened 6 years ago

LouisJenkinsCS commented 6 years ago

I believe that the user should be able to specify the intent of local variables used inside of on statements, suggested in issue #9907. There are a plethora of performance reasons as to why the user would want to specify the intent as other than ref or const ref, such as for example in when it is desired to access a by-value copy, or inout if the user desires that a sequence of load/store operations does not become a sequence of PUT/GET communications. There can also be some applicability for reduce intent for more complex data structures, such as the example I posted in issue #9879 seen here.

An example of its usage for when it is useful...

Read-Only access to a local array from another locale

Since A would be treated as a const ref, we would have to perform a GET for each access.

var A : [1..10] int;
on Locales[1] with (in A) {
   doSomethingWith(A);
}

Related issue #9757

Multiple updates to local array from another locale

Since normally A would be written by reference, we now get 1 GET and 1 PUT rather than 10 PUT operations.

var A : [1..10] int;
on Locales[1] with (inout A) {
   [a in A] a += 1;
}

Updating a data structure from another locale

In this case, the reduction can handle combining remote/distributed operations on a local copy, providing an alternative to privatization. Not applicable in all cases, but can be a very cool addition.

var data : list(int);
on Locales[1] with (reduce data) {
   data.push_back(someLocaleSpecificData);
}
bradcray commented 6 years ago

I think one interesting / challenging question with this proposal is what the "default" intent should be for an outer-scope variable if it's not mentioned in the with statement. Today the answer is ref for all types. This seems natural from a lexical scoping perspective, but differs from the default task intents used when variables crossing parallelism constructs. There, the argument is that the default intents reduce the chances of races; here the on-clause doesn't introduce parallelism, so doesn't introduce parallelism itself. My intuition is to preserve today's ref-by-default semantics, but is the inconsistency too weird?

I've mentioned it elsewhere, but one reason we historically haven't undertaken this feature is that it's possible for the user to write such patterns themselves using local variables within the on-clause. Contrast this with the forall case where the user doesn't have any other means to introduce a variable at the task granularity. That's not a reason to dismiss this proposal, just one of the reasons we haven't considered it a requirement before now.

For example, I think the examples from Louis's write-up could be written as follows:

var A : [1..10] int;
on Locales[1] {
  var locA = A;
  doSomethingWith(locA);
}

on Locales[1] {
  var locA = A;
  [a in A] a += 1;
  A = locA;
}

var data : list(int);
on Locales[1] {
  var locData: list(int);
  locData.push_back(someLocaleSpecificData);
  data.append(locData);  // I think this is what's desired?  Note that the `reduce` was missing its reduction operator
}
LouisJenkinsCS commented 6 years ago

Without an optimized reduction for the last example using the list, there would still be implicit communication in that data.append will iterate over the iterable argument (which in this case would be a wide const ref), which as mentioned before would result in many implicit PUT/GET communications. It can be performed like that, but it is much less efficient than the ideal.

Edit: Its actually concat that can append another list, but same argument applies.

Edit2: Its the other way around, iterating through locData is local, but even worse is that you have at least 3 PUT/GET communications for each element, so performance would suffer a lot in append

bradcray commented 6 years ago

It's not obvious to me what aspect of the user-defined reduction interface design would optimize the transmission of linked lists across locale boundaries... Can you state what you imagine happening under the covers and what the author of a user-defined reduction would need to write to enable that?

LouisJenkinsCS commented 6 years ago
proc list.concat(other : list) {
   this.tail.next = other.head;
   this.tail = other.tail;
   other.tail = nil;
   other.head = nil;
}

Note: We can't just move all data to a single node or else we end up with a load-imbalance. Yes, it means navigating the list is slow, but it is a distributed linked list after all. If the reduction is performed over a single node, then its fine and efficient. All-in-all, concatenation is performed in O(1) time given the current naïve design of the List module. Note that we don't need to copy data over in a reduction as it is assumed that the child list gets destroyed after concatenating everything anyway.

In essence, reducing a list or any data structure will perform the most efficient concatenation algorithm possible with the advantage that the other data structure will be soon destroyed.

Edit: Lets move this to my forall example in issue #9879...

// Objective: Go from distributed array -> distributed list (no synchronization needed)
var data : list(int);
var A : [someDistributedDomain] int;
forall a in A with (reduce data) {
   data.push_back(a);
}

Imagine that A is a block distributed array, equally distributed across numLocales. We can take advantage of the fact that the reduction is performed locally first and then globally, meaning that, for example, you have all nodes allocated on locale 0 linked together, all nodes allocated on locale 1 linked together, and so on. This locality of the list formed by the reduction can be exploited by advanced users.

bradcray commented 6 years ago

We can't just move all data to a single node or else we end up with a load-imbalance.

Again, it wasn't at all clear to me what you were expecting this reduction to do. It's clearer now. (Omitting the definition of the reduction you want to perform leaves a lot of guesswork as to what should be expected to happen). That said, I'm still not seeing why using a reduction operator would result in doing a singly-linked list concatenation any faster than simply providing an "absorb list; leave empty" method on the list type would.

We can take advantage of the fact that the reduction is performed locally first and then globally, meaning that, for example, you have all nodes allocated on locale 0 linked together, all nodes allocated on locale 1 linked together, and so on.

OK, the forall case makes sense to me (though note that you're still missing the name of your reduction in the with-clause). But it also seems similar to the original motivation for providing task-/forall-intents: it permits you to do things at task boundaries without having to manually decompose the forall statement. Whereas I'm still not seeing why it supports the argument for providing a reduce intent on an on-clause. (i.e., If I'm writing out the on-clause, I could just do it manually; if I'm not and the on-clause is within a forall loop, the compiler ought to already be doing the right thing w.r.t. locality and the user-facing feature isn't required).

LouisJenkinsCS commented 6 years ago

You can do all of this manually, it just looks a lot nicer and less error-prone than having the user have to write it out themselves. Doing it this way allows you to maximize code reuse by using the forall reduction logic, and it saves the user a ton of time. I can see why this proposal isn't really something that should be given much of any priority compared to other things, but the language would be nicer IMO with this small Quality-of-Life change.

LouisJenkinsCS commented 6 years ago

though note that you're still missing the name of your reduction in the with-clause

I intentionally left it out, as I also think it'd be cool to perform the reduction on the object itself. list could implement the User-Defined Reduction Interface and have it applied to itself, hence not requiring an operator.

bradcray commented 6 years ago

list could implement the User-Defined Reduction Interface and have it applied to itself, hence not requiring an operator.

It would still require an operator to parse and so that the compiler would know whether to sum the elements of the list, find the min, the max, concatenate the sublists, etc. I.e., there is no "use the default reduction for this type" capability today (nor has one been proposed that I'm aware of).

bradcray commented 6 years ago

You can do all of this manually, it just looks a lot nicer and less error-prone than having the user > have to write it out themselves.

I was thinking about this issue a bit more this morning and remembered another reservation for not supporting this feature beyond "you can already do this manually" and "it's additional work." One impact of attaching with-clauses to on-clauses is that we'd either (a) need them to be different than on parallel constructs (forall, coforall, cobegin, begin) or (b) dramatically redefine our notion of being a global address space language. For example, today I can write:

var x = 1;
on Locales[numLocales-1] do
  x += 1;

and the code does what a naive user might expect, which is to increment x. If we want to add with-clauses for ons and still have this behavior work as it currently does, we'd have to say that the default intents for variables crossing on-clauses differed from the default intents for parallel constructs or function arguments (i.e., say that they were always ref), which seems like an unfortunate asymmetry. (In many respects, it felt similarly unfortunate to add default intents to parallel constructs and make them different from on-clauses, but the arguments for doing so in that case were stronger since it reduced the likelihood of race conditions and supported writing patterns that we couldn't otherwise, particularly for forall loops).

The other option would be to unify the default intents for on-clauses to match the other cases and say that this case would no longer work unless you added a with (ref x) clause. But this would be a pretty huge change to the language, where traditionally we've said that the introduction of on-clauses doesn't change the semantics of a program, simply its implementation and performance characteristics. This seems like too big and too impactful of a change to me at this point in the language's evolution.

So if you forced me to choose between these two options, I'd choose the first (different default intents for on-clauses). But my main preference would still be what's on master today.

LouisJenkinsCS commented 6 years ago

I don't see the problem with having the default intent of on statements being different from the default intent on other parallel constructs. I would agree that on should have ref as its default intent and that it should be documented as such. I believe that adding a with clause to on shouldn't change previous behavior, but instead it should enable newer and more well-defined behavior.

I am no longer a new newbie to Chapel, but I can imagine that if I was, I wouldn't have too many issues picking up on the fact that on acts differently from parallel constructs, and if I did, it would be a problem of educating the user. A more advanced user would still be able to take advantage of this to squeeze out more performance, such as by eliminating extra communication by having the by-value data be copied into the initial active message, rather than having to make a roundtrip for each variable they wish to retrieve by-value anyway, for example.