Introduce a constant value for an empty range or domain

LouisJenkinsCS commented 6 years ago

I believe that there is a need for a constant for empty ranges; currently a way to declare an empty range is something along the lines of 0..-1, but this looks rather non-intuitive to newcomers. It may not be immediately apparent as you would not generally loop over an empty range (at least not intentionally), but imagine you want to declare a field for an array that you do not know the size of yet...

// Imagine each of these records are huge and they 
// hold a lot of nested records and large # of fields
record LargeRecord {
   // ...
}
var arr : [0..0] LargeRecord;

Now the above arr allocates one of these large records which increases the time spent for allocation (if its large enough). If later you want to resize arr you have to deal with it copying the older unused LargeRecord by-value into the larger array unnecessarily. This is inefficient especially if it is a field of some class instance.

Now the current way to declare an array as empty is like such...

var arr : [0..-1] LargeRecord;

Which will eliminate the extra copy and increased allocation time, but it looks non-intuitive... what does this even mean? We have an array which starts at 0 and ends at -1? Is this undefined? Of course not, but it could be a common train of thought for a newcomer.

Perhaps we should add a new constant as part of the domain or range namespace...

var arr : [domain.empty] int; // or range.empty

I believe this makes it much clearer.

LouisJenkinsCS commented 6 years ago

To show an example in a core Chapel module that uses this empty domain syntax, see the DefaultAssociative module that complains about it and later directly uses it.

LouisJenkinsCS commented 6 years ago

Another potential use-case is when you have operations like push_back on an array with the domain {M..N}, which would result in the change of the domain to {M..N+1}. If we make domain.empty be a constant range {0..-1}, what happens when the user wants to start some other index? Perhaps there should be a inline function that takes in the start index idx and produces an empty domain {idx..idx-1}? Like this...

// {1..-0}
var arr : [domain.empty(1)] int;

where...

inline proc domain.empty(idx : integral) {
   return {idx..idx-1};
}

ronawho commented 6 years ago

I like this proposal, but note that an interesting case to keep in mind is for uints, where 0:uint - 1 is max(uint)

LouisJenkinsCS commented 6 years ago

That's a good point... I think perhaps we could modify the _aligned field in ChapelRange.chpl to be something similar to a bitmask. This way it can hold whether or not the range is 'empty' and whether it is 'aligned' in the same field and it won't change the size of the range construct.

If a domain is created by domain.empty(0:uint), then it will create a range with the flag empty set, _low = 0 and _high = max(uint). This way any arrays or distributions that are being created with this domain can check whether the domain is 'empty' and hold off on allocation.

bradcray commented 6 years ago

Addressing this discussion in backwards order:

Personally, I don't think we should reimplement ranges to store an empty bit, nor reinterpret ranges whose low bound is greater than their high bound. That is, I think that low > high is sufficient to define an empty range and to interpret whether an existing range is empty or not, and that it'd be very problematic to start interpreting ranges where low > high as anything other than empty at this point in the language's evolution (we didn't enter into it lightly and I think it's working well for us).
Note that the canonical empty range is 1..0 rather than 0..-1 due to the fact that it works with signed or unsigned integers of any width, and that this is what you get if you declare a range or rectangular domain and don't initialize it . Also note that any range in which low < high is empty, so if you wanted an initially empty array to start at a specific index, say 100, you could declare it to be A: [100..99] real; or A: [100..0] real; or ... This, combined with Louis's comment about push_back() above suggests to me that maybe we just have an education problem rather than the lack of a named empty range problem. So maybe we should just post and answer a "How do I create a 0-element array?" question in Stack Overflow?
In the following: var arr : [domain.empty] int; note that since there are many types of domains, this would actually need to be something like var arr: [domain(1).empty] int; to work. That said, I feel a little nervous about this approach because it seems to rely on a 0-argument factory method on a type which seems non-intuitive to me. Given Louis's "maybe I'd want a different low bound" point, it might need to be var arr: [domain(1).empty(low=100)] int; I'd be more comfortable supporting something like new domain(1) or new domain(rank=1, low=100); because it reads more like the creation of a new domain value to me (but it does still rely on an understanding that the default value of a domain is an empty domain).
I don't really understand the [0..0] array case. It seems obvious that this would/should create a 1-element array regardless of what we do with this proposal. Is the point that a user may not be smart / knowledgeable enough to know how to get an empty array, and was too lazy to learn how to do so, so might needlessly create non-empty arrays instead?

mppf commented 6 years ago

FWIW I think 1..0 is a reasonable canonical empty range.

chapel-lang / chapel

Introduce a constant value for an empty range or domain #8907