Enhanced range and index expressions

Here are some common expressions for a sequence s where 0 <= n <= n + k <= |s|:

s[|s| - n..]  // last n elements of s
s[..|s| - n]  // all but the last n elements of s
s[n..n + k]   // k elements of s, starting at n

(The examples work equally well for an array s, if you replace |s| by s.Length.)

The expressions are kind of ugly, with the appearances of |s| in the first two examples and the repetition of n in the last example. The last example can be written as s[n..][..k], but that may not be as easy to understand.

I propose the following new syntax to write these examples in a simpler way. In particular, inspired by the syntax of the new C# index expressions, I propose:

ElementSelection ::= Expression "[" StartIndexExpr ".." EndIndexExpr "]"
StartIndexExpr ::= [ [ "^" ] Expression ]
EndIndexExpr ::= [ [ "^" | "+" ] Expression ]

Placing a ^ before one of these expressions e has the meaning |s| - e where s is the sequence (or array), and placing a + in front of an end expression e has the meaning d + e where d is the value of the start expression. Using this syntax, the three common expressions above can be written

s[^n..]   // last n elements of s
s[..^n]   // all but the last n elements of s
s[n..+k]  // k elements of s, starting at n

Furthermore, since the index expression s[i] is s[i..][0], it also makes sense to allow the expression s[^i] with the meaning s[^i..][0]. For arrays, this would apply to any dimension, using the appropriate s.Lengthm instead of |s|. For example, the expression s[^1, ^1] would mean s[s.Length0 - 1, s.Length1 - 1].

IndexExpr ::= Expression "[" Indices "]"
Indices ::= [ "^" ] Expression [ "," Indices ]

Note: The proposed new syntax does not clash with any existing syntax.

Note: Some languages or libraries allow negative values for indices, with the meaning that |s| is added to any negative index. Such a solution is inferior to what I've proposed here, for three reasons. One reason is that it requires a verifier-time check and run-time check to determine which kind of index operation is intended, which isn't as efficient either for the verifier or for the running program. A second reason is that it misses the opportunity for the verifier to catch the error of accidentally using a negative value as an index. A third reason is that this "negative trick" does not work with range expressions, since 0 equals -0 and therefore you don't know if 0 or |s| is intended.

I like this proposal, though I'd like to add a few things:

I fear that s[n..+k] will be confused with s[n..k], especially when used with literals: s[1..+3] is it not valid syntax today, but +3 is a synonym in sufficiently many languages for 3 to think that some users will get tripped up by this.
I'm not sure that this syntax needs to work with general expressions — literals might be enough. Additionally, if the syntax ^n only worked with n being a (positive) literal, then the case for using -n would be less weak: argument 1 would not apply (no verifier or runtime check); argument 2 would not apply (no unexpected negative indices); and argument 3 would not apply ([n..-0] should be written as [n..]). In fact, even with the ^ syntax, it could be reasonable to start with literals only?

dafny-lang / dafny

Enhanced range and index expressions #1589