range slicing: how to report an error?

vasslitvinov commented 2 years ago

The result of Range slicing may produce an error in the cases where the result is not representable within the static type of the result. For example:

r1 type	r2 type	r1(r2) type	can't represent
(int, boundedLow, stridable)	same	same	(1.. by 2 align 0) (1.. by 2 align 1)
(int(8), boundedLow)	(int, boundedLow)	(int(8), boundedLow)	(1:int(8)..) (256..)
(int, boundedNone, stridable)	same	same	(.. by 2) (.. by 4)

This issue asks: what should the program do if one of these situations occurs?

halt
halt if boundsChecks are on, undefined behavior otherwise
throw an exception regardless of boundsChecks
ditto, with this refinement: provide a non-throwing overload of the slicing implementation for those cases where we know statically that none of these errors can occur

Notes

In some cases the intersection of the two ranges is semantically an empty set of indices. In some of these cases the slicing implementation produces the empty range 1..0.

With today's implementation, the first and the third examples in the above table halt the program; the second example produces 0... The answer 0.. is incorrect for the second example. The third example should probably be representable and produce .. by 4.

bradcray commented 2 years ago

In some cases the intersection of the two ranges is semantically an empty set of indices. In some of these cases the slicing implementation produces the empty range 1..0.

I was going to suggest taking this approach for cases that were empty / didn't make sense / weren't representable (where the first two cases had this flavor to me)

For the third case, I could either imagine a compile-time error (e.g., "Can't intersect two unaligned, unbounded ranges") or having the result be .. by 4 as you suggest.

vasslitvinov commented 2 years ago

I was going to suggest taking this approach for cases that were empty / didn't make sense / weren't representable (where the first two cases had this flavor to me)

The result of a slicing operation needs a single static type. What should it be when slicing two ranges((int, boundedLow, stridable) ? I expect it to be also range(int, boundedLow, stridable). If the result turns out - at run time - to be non-representable, what value should we return?

For the third case, I could either imagine a compile-time error (e.g., "Can't intersect two unaligned, unbounded ranges")

Unfortunately we do not have the information at compile time whether a range is unaligned, so we can't issue this error without being overly conservative.

Also I do not see why (.. by 2) (.. by 4) needs to be an error. Or, for that matter, (.. by 2) (.. by 2)

bradcray commented 2 years ago

The result of a slicing operation needs a single static type. What should it be when slicing two ranges((int, boundedLow, stridable) ? I expect it to be also range(int, boundedLow, stridable). If the result turns out - at run time - to be non-representable, what value should we return?

For this case (the intersection for two boundedLow ranges of the same idxType), the result should always be representable, shouldn't it? But now that you've said that, I'm better understanding that for the case in the OP where the original range is int(8) and the other is int, we don't have a high bound to store, so can't store an empty range. Some options (in my rough order of preference) might be:

require the slicing range's idxType to be "<=" the slicee range's? (e.g., an int(8) range could only be sliced by other int(8) ranges; an int(16) range by int(8), uint(8), and int(16) ranges) — I'm not sure how intrusive this would be in practice, but would guess "not much"? That said, it's also thematically different than what we've been doing for # and by recently, but potentially with a reasonable rationale. It's also the easiest approach to relax the behavior of later since we wouldn't be breaking existing codes. The big question would be whether it would impact codes today (I'm guessing 'not').
halt on bounds-check, permit integer wraparound to occur otherwise

Unfortunately we do not have the information at compile time whether a range is unaligned, so we can't issue this error without being overly conservative.

Isn't any unbounded, strided range unaligned? I guess maybe your point is that .. by s always is unaligned, but that .. by s align a is not? Those seem like cases we could distinguish between statically, though (with refinements to the range's type) if we wanted to? That is, distinguish between (normal, unaligned) strided boundedNone ranges and aligned strided boundedNone ranges?

Oh, or maybe your point is that if s is 1 or -1 (and not a param, so unknown), then it would be aligned? (or 'aligned-ish'... as soon as it was strided by anything else, it would become unaligned again due to no bounds to count from).

Also I do not see why (.. by 2) (.. by 4) needs to be an error. Or, for that matter, (.. by 2) (.. by 2)

Did I say it needed to be? I didn't think so:

or having the result be .. by 4 as you suggest.

That said, being overly conservative doesn't seem awful to me either. It's not clear that this pattern is useful or well-motivated to me. By making it an error, we'd learn if users did have reasonable use cases of it and could deal with it then.

(In ending on that note, I'm not trying to suggest that it's my preference. Last time, I ended on the other option and would be fine with either at present. I could imagine either going with whichever was less work or going with the error because it's more conservative / able to be improved in the future if/when needed).

vasslitvinov commented 2 years ago

@bradcray - digesting your thoughts, here is what I propose:

Enumerate the cases that may produce a non-representable resulting range.
Prototype a change that makes them a compile-time error.
If this is acceptable over our code base, keep it this way for now.
Otherwise, turn such cases into halts().
In any case, we do not want slicing to throw an error.

How does it sound?

For this case (the intersection for two boundedLow ranges of the same idxType), the result should always be representable, shouldn't it?

The intersection is an empty range when the two strides are the same and the two alignments are not. We cannot represent an empty range whose static type is "boundedLow".

Isn't any unbounded, strided range unaligned?

Today, no. One case is .. by s vs. .. by s align a. Also, as you point out, ..n by s where s is statically unknown may or may not be aligned.

Those seem like cases we could distinguish between statically, though (with refinements to the range's type) if we wanted to?

Correct. However, we have decided not to take this route: https://github.com/chapel-lang/chapel/issues/19717#issuecomment-1217304493

vasslitvinov commented 2 years ago

In https://github.com/chapel-lang/chapel/issues/19717#issuecomment-1211502497, the leading proposal is for the iteration to count A[hi], A[hi-2], A[hi-4], ... :

config const lo = 1, hi = 1000, bound = 2, stride = -2;
var A: [lo..hi] real;
for a in A[bound.. by -2] do ...

In other words, our intuition is that the slice 1..1000[2.. by -2] should count down.

Therefore I propose that slicing with a negative-stride range reverses the direction of the original range. This will change the current rule that slicing preserves the stride of the original range.

Comments?

bradcray commented 2 years ago

digesting your thoughts, here is what I propose:

That sounds like a fine plan to me.

In other words, our intuition is that the slice 1..1000[2.. by -2] should count down.

Therefore I propose that slicing with a negative-stride range reverses the direction of the original range. This will change the current rule that slicing preserves the stride of the original range.

Comments?

One minor comment here is that I'm/we're not entirely sure what the implication of negative strides on arrays should be, and we've marked them unstable as a result of this general uncertainty. E.g., should it mean:

layout in memory is reversed?
serial iteration is reversed?
serial parts of parallel iteration are reversed?
printing out the array prints it out in reverse order?
other things are reversed?
some combination of the above?
none of these?

One argument for "none of these" is that once we create a domain, we've created a set of indices, and while the negative stride can be useful to identify the members of the set, the natural ordering of the set's indices may govern these other things instead of the stride. This is what I'd do if I had to make a snap decision today without further reflection or debate. But I'm also not confident that it's correct. And, testing, I see that it's not what we do today.

Anyway, all that is to say that I agree that 1..1000[2.. by -2] should be equivalent to 2..1000 by -2 but that I'm not sure that serial iteration should count down, as with other negatively strided arrays.

For the question you're really focused on, though:

Therefore I propose that slicing with a negative-stride range reverses the direction of the original range. This will change the current rule that slicing preserves the stride of the original range.

I think I'm agreeing with you. What's less clear to me is how well this holds up when both ranges have negative strides or just the range being sliced has a negative stride.

vasslitvinov commented 2 years ago

I moved the two comments above to the issue where they belong: https://github.com/chapel-lang/chapel/issues/20462#issuecomment-1254526680

The only part of them that is relevant here is from @bradcray :

digesting your thoughts, here is what I propose:

That sounds like a fine plan to me.

chapel-lang / chapel

range slicing: how to report an error? #20461

Notes