chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

Support stylized local array slices: `A[local]` #12086

Open bradcray opened 5 years ago

bradcray commented 5 years ago

As a Chapel programmer, I'd like to get a logical "view" to all of the local elements of an array such that subsequent accesses to that expression are known/asserted to be local to the current locale. The idea is that this should work whether or not those local elements are contiguous / easily described by a rectangular domain/array. The proposed syntax is to use A[local] for an array A.

This feature should be particularly useful when captured in a reference such as:

ref locA = A[local];

or made part of a forall loop's intents like:

forall i in D do with (ref locA = A[local]) {
  ...
}

such that all subsequent local computations on locA are cheaper than the equivalents to A. This can also result in cleaner code than using method based approaches (e.g., A.localAccess(i.j)).

The proposed implementation approach is to use an array view similar to the ones we use for slicing in which operations like dsiAccess() are translated to cheaper equivalents like dsiLocalAccess().

mppf commented 5 years ago

I'm curious, do you think it'd be possible to create a variant of A[local] that gave the local array elements for a particular core, so that it could assume (for example) that locking is not necessary within a forall / coforall? Perhaps that would require that the parallel loop containing the A[local] would need to have a consistent mapping to cores (or something?).

bradcray commented 5 years ago

@mppf: What kind of patterns are you imagining that would require locking? Specifically, I don't think locking would be required for the following:

coforall loc in Locales do
  on loc {
    ref locA = A[local];
    forall a in locA do ...
  }

But I'm guessing you're thinking of something more complex?

mppf commented 5 years ago

Suppose you have an algorithm that is written in an SPMD way (for one reason or another) and it operates in phases using a distributed array A and some distributed updates data structure:

I'm wondering about a bunch of things that might not end up being really connected to this slice syntax.

I think these questions might be connected to the way we currently make it easy to get at the part of the array local to a node but it's not easy to get at the part of the array local to a core (but I don't have anything more concrete to offer as an example).

e-kayrakli commented 5 years ago

I always imagined slicing a distributed array with a Locale object or an array of Locale objects can enable nice idioms for implementing pipelined/systolic algorithms like SUMMA etc.. For example, for a 1D array, A[Locales[here.id-1]] would give me my left-neighbor's data. I can use that to copy my neighbor's data into a local array.

I must agree that A[local] is the best way to say "give me my local data", but I am wondering if using Locale objects can be more versatile in the long run. In that case "give me my local data" can be said by A[here].

It should (I think) avoid changing the compiler, as well.

bradcray commented 5 years ago

@mppf: Thanks for clarifying. To me, this feels farther afield than what I'm looking for in this issue (e.g., dipping into questions of barriers within foralls) and maybe better suited for a new issue. Intuitively to me, the notion of "what does/should my core handle" feels more like the role of an iterator than the language itself, particularly since (in the default flat locale model) cores don't actually own array elements, making any such mapping more of an offshoot of how loops over the elements happen to be partitioned and scheduled. I think it could be useful to support a standard iterator to be used in such contexts to guarantee similar mappings of iterations to cores and using it consistently, but I don't view it as relating deeply to the type of array view being proposed here (though maybe I'm just not thinking expansively enough).

In contrast, I view the A[local] proposal as being more about reducing the overhead of array operations known to be local by focusing in on just the local portions of the array and avoiding generating code to potentially handle communication in cases where it's known not to be needed (e.g., all accesses are effectively localAccess()es, all slices are effectively localSlice()s. For example, I think a common idiom may become:

forall i in D with (ref locA = A[local]) do
  ...locA[i]...

which should eliminate any overheads related to A[i] being potentially non-local (where an even better solution would be to improve the compiler's reasoning about array-domain relationships so that it could do this transformation automatically).

Engin: I think your proposal has merit as a shorthand for the current way of expressing this which would be A[getLocalSubdomain[s](Locales[here.id-1]) but I have concerns about it serving as a useful replacement for this proposal. Specifically, when the compiler sees A[local] it can trivially know that the data in question is completely local to the current locale since it relies on a keyword for its expression, letting it build a view of the array's local data that avoids any communication. By contrast, if the way of expressing this pattern was through a proc this(loc: locale) accessor, then the remote and local cases end up calling into the same routine and requiring static reasoning about the argument value ("Is loc == here?") in order to capture the benefits of the local case. Moreover, since the elements owned by a given remote locale may not be contiguous nor expressible in a closed domain form, what you'd likely end up getting back from such an access is an array view descriptor that refers to potentially remote data rather than the rectangular subarray that it sounds like you are envisioning (meaning users who want to "copy remote data into a local array" would likely need to do more work than var B = A[remoteLocale];. But mostly, while such a concept could be convenient, I don't see it as being as deeply useful from an optimization point of view as this proposal since it's effectively a sugar over the longer-form way of writing such slices today.

e-kayrakli commented 5 years ago

@bradcray -- Thanks for the explanation. I see that A[local] has better prospect when it comes to performance.

I'd still advocate for being able to use Locale objects for slicing arrays. But only as a syntactic sugar as you suggested. I don't mind creating an issue for that... unless you have an objection?

bradcray commented 5 years ago

I think a new issue requesting the sugar for indexing into arrays using locales sounds fine, thanks.

ronawho commented 4 years ago

See https://github.com/chapel-lang/chapel/pull/15842 for a draft implementation

sdbachman commented 1 year ago

I had some code yesterday where I could have really used this feature...