chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

When should `.indices` be a procedure vs. iterator? #18353

Open bradcray opened 3 years ago

bradcray commented 3 years ago

This issue is a follow-on to #17883 which re-defined .indices on arrays to make it more distinct from .domain. Specifically, the goal of this issue is to ensure that we're in agreement about when .indices should be a procedure vs. an iterator, where the three main options are:

As background, .indices on types like tuples, strings, lists, etc. return ranges today, which has certain advantages:

In my PR #18274 which began the process of deprecating .indices being a near-alias for .domain, I essentially took the third approach, based on the following philosophy:

My rationale for the irregular case was a minor fear that in the case of a distributed sparse or associative array, there may not be enough memory to store the indices as a local sparse or associative domain. I also imagined that, given an iterator, it would be simple enough to capture the indices back into a domain (or array or list or ...) if that's what the user wanted.

That said, here are some alternative perspectives:

aconsroe-hpe commented 2 years ago

Does sparse by itself have the potential for OOM? If you are sparse over a rectangular domain, then you still have an O(1) representation right? Might be an aside so click through

To answer my own question, I tried: ```chapel var D = {1..3}; var S: sparse subdomain(D); writeln("D"); for i in D do writeln(i); writeln("S"); for i in S do writeln(i); ``` and got ``` D 1 2 3 S ``` so perhaps this is consistent with what Chapel already does, but is a bit surprising. I would have expected the same output for both, since I would expect something like "`A.indices` returns ALL `k` st. `A[k] = ....` is valid ---

One potential issue that comes to mind with the iter approach is concurrent modification and iterator invalidation. Would we need to define the semantics for iterating over an associative domain (or any other than used an iter) (is this already defined or discussed somewhere? I couldn't find anything in a quick search)? Do we have to lock? Is there even a reasonable implementation for us to do something like Java's ConcurrentModificationException if we wanted to view this like a bounds check (on in development and off in --fast)

bradcray commented 2 years ago

Does sparse by itself have the potential for OOM? If you are sparse over a rectangular domain, then you still have an O(1) representation right?

In general, sparse domains require O(size) storage where size is the number of non-"zeroes"/IRVs in the sparse subset. So the sparse domain in your example is O(1) because it stores no indices but if you added indices 1, 2, 3 to it, it would be O(n) (where n = 3).

One potential issue that comes to mind with the iter approach is concurrent modification and iterator invalidation.

Generally speaking, Chapel considers concurrent iteration over, and modification of, a data structure like an array or domain to be a user error that we don't check or keep you safe from. I believe this is documented in a few places (?), but it's one of those things that's hard to know where exactly to say it. I'm hoping we can tighten this up a bit as we focus on what the parallel-safe versions of the collections look like.

Given that fact, I'm not sure that it suggests that taking one approach or the other is better here. If it's an iterator, it means you can't call .indices while you're simultaneously modifying the domain. But even if it's a proc, the copy of the indices that would have to be created before returning to you could still have a race with a simultaneous modification of the domain in some other part of the code (?).

For me, the principle that pushes me toward iterator here is that Chapel tries to avoid requiring copies of arrays (or, more generally, O(array size) data structures) unless the user explicitly requests that copy. While we could document .indices() as being a routine that does in fact cause an O(n) copy to be made for sparse/associative data structures, it feels fundamental enough—and something that we may want to use generically across data types—that it feels more natural to keep its implementation O(1) and to have the user more explicitly introduce the O(n) space by capturing the iterator into a variable if they want to.

vasslitvinov commented 2 years ago

This looks like fulfilling conflicting goals with a single function. So what are the things that the user may be asking for? "I want the set of indices of this array such that:"

(a) it must be a single domain (b) it must be a single local domain (c) it must be a copy of myArray.domain (d) it must have a compact representation (e) I need to iterate over the indices (f) some mix of the above, like "I'd rather it be compact" or "I'd rather it be a local domain"

Some of the things we could do:

For the method named .indices specifically, #15024 and #15039 intend it to be used to iterate over the index set. So it makes a perfect sense for it to be an iterator. As a purist, I would go one step further and make it always be an iterator, even for other types. However, I do not see a practical benefit of that.

vasslitvinov commented 2 years ago

A big question that I do not see answered is whether forall A.indices do should distribute the iterations like forall A.domain do would, and why. Ditto other indice-able datatypes that may be distributed.

If yes, then I strongly suggest that A.indices should be an alias for A.domain and domain.indices should return a const ref to this. Because to me the benefit of simplicity outweighs the benefit of avoiding two ways to do the same thing.

If no, then .indices becomes limited in a way: we can't apply it to a distributed array and get reasonable performance. This may be OK, however I don't like it so pushes me to choosing the other option.

bradcray commented 2 years ago
  • provide different-named methods to cater to specific purposes
  • generate a compilerError when the array in question cannot be handled gracefully

At this point, I'd need to see a concrete proposal rather than a list of concepts to give much weight to a counterproposal.

A big question that I do not see answered is whether forall A.indices do should distribute the iterations like forall A.domain do would, and why.

It's not intended to, the rationale being ".indices" is a way to get the indices in a way that is local to a given locale, and if you want a query that respects the array's distribution, you should be using .domain.

If no, then .indices becomes limited in a way: we can't apply it to a distributed array and get reasonable performance.

That depends on the distributed array and what you're doing in the loop, of course. If the distributed array is small in size and the loop is not accessing the domain, then it could be completely reasonable. If you're trying to do a local computation that computes the overlap of two distributed arrays, it could be as well (e.g., this would be the user-facing way of saying myBlocArray._value.whole which we don't have another way to do today). If your goal is to get something that runs where the array is allocated, that's what .domain is for.

vasslitvinov commented 2 years ago

I am hearing this definition of .indices in general:

This makes sense. The point of staying on the current locale should definitely be prominent in the documentation. Also -- if applicable -- instability disclaimers "we can switch from iterator to procedure or visa versa" and "we can change the procedure's return type".

Here are my concrete proposals.

Avoid the overhead of creating a new rectangular domain in order to ensure that .indices is lightweight. In those cases where .indices currently returns a new domain, change it to return a range (for 1-d) or a "multi-dim range". "Multi-dim range" is a new type that I propose to introduce. It is a record that wraps a rank*range tuple and provides domain-like serial and parallel iterators.

Enable .indices on domains, to return the same range/multi-dim range/iterator as an array would.

Provide .indicesAsProc procedure and .indicesAsIter iterator. They produce compiler errors when the requested kind is not available. These of course can be added later and under different names.

bradcray commented 2 years ago

instability disclaimers

I'm not imagining this being unstable once we resolve this issue. Instead, I think of it as being well-defined and documented for each given type (i.e., the docs will say proc or iter and indicate the return/yield type). That said, I don't expect the difference to be particularly noticeable to users in most use cases since either can be iterated over or captured in a variable. I.e., there's not a lot you could do with x.indices() that would distinguish the implementation (and it would be great if we could go further to minimize things that do distinguish them... like supporting intersection of iterators potentially).

Enable .indices on domains, to return the same range/multi-dim range/iterator as an array would.

Provide .indicesAsProc procedure and .indicesAsIter iterator. They produce compiler errors when the requested kind is not available. These of course can be added later and under different names.

Consider the impact of this proposal for a list. Today, .indices on a list will return a range indicating that list's indices. The user can store it, or they can iterate over it serially or in parallel (because it's a range). To also require list — and every other indexable collection type — to provide .indicesAsProc as a redundant alias for .indices and the four (maybe someday six) overloads of .indicesAsIter to support all loop forms as a means of avoiding the "overhead" of creating and returning a range feels like busywork for the collection author and superficial noise in the interface. As a user, I'd much rather write forall i in x.indices() than forall i in x.indicesAsIter().

I'd rather have us spend our effort optimizing away any non-negligible overheads for creating ranges (and domains in the array case) than to bulk up the interface in this way for the sake of consistency/orthogonality.

The "types can implement this lazily over time" approach doesn't really feel compelling to me since, if the point is really to unify the interfaces for orthogonality, it would only really be valuable if each type implemented all three routines (plus iterator overloads). If I'm willing to check which of the three routines a given type supports in my code, it seems no worse than reflecting on whether x.indices() is an iter or a proc and/or what type it returns.

Finally, I'm not a fan of the names, which seem clunky rather than Chapeltastic.

Multi-dim range" is a new type that I propose to introduce. It is a record that wraps a rank*range tuple and provides domain-like serial and parallel iterators.

That sounds an awful lot like a domain. Granted, your multi-dim range doesn't use a class as its implementation, but we've already discussed the desire to write domain maps that provide their implementation using records rather than classes. And we've reserved the right to continue changing the dsi interface and domain map implementation to support additional optimizations like these. Similar to the previous comment, I'd prefer to spend our effort on reducing the overheads for creating domains if it's so bad rather than adding a new domain-like thing whose differences relative to a domain we'd need to explain ("Wait, so Chapel has two novel types to represent index sets as first-class concepts? Why exactly?").