chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 419 forks source link

Detecting errors in a size'd range with a stride #24423

Open damianmoz opened 7 months ago

damianmoz commented 7 months ago

A range can be defined by its bounds

const R_with_bounds = 1 .. N; // lowerBound .. upperBound

or by a length and a starting index.

const R_with_a_length = i .. #L; // lowerBound .. #Length

Very occasionally, mostly for reasons of working with data whose structure is dictated by some other processing need such as an API, one works with a stride S. A Chapel range of the first form is then written as

lowerBound .. upperBound by stride

or, with the example above

const R_with_bounds_and_stride = 1 .. N by S; // a subscript triplet ??

Every once in a while, one wants to have a range with a lowerBound, a Length and a Stride. Simply appending the by operator to the previous range declaration (as in):

const R_with_a_length_and_stride = i .. #L by S; // bad news beastie

introduces a gross error. The length of the second range is no longer L even though naively reading the syntax may appear this way. The by S clause is not a qualifier but a left associative operator with an argument.

As a strided range (with a length) is likely to be used far less often than an unstrided range, the error above is not going to be immediately obvious to the programmer especially as it is correct syntactically. Such an error will take some time to find. This is especially true if one's working week also involves reading or writing, code in a language other than Chapel which describes a strode range by a qualifier rather than a left associative operator.

The correctly strode range should in fact be written as

const R_with_a_size_and_stride = i .. by S #L;

One can have little scripts to scan one's code and look for such user errors. But it would be more rigorous if the compiler or a static analyzer or linter could look for such mistakes, especially as Chapel already issues warnings for other esoteric things.

It is arguably a low priority task as a quick and dirty script can be cobbled together to identify such things. But to those who has inflicted such an error on themselves and taken hours to identity their mistake, it might have a much higher priority, especially in the period where they are castigating themselves for their blunder! Just a thought.

bradcray commented 7 months ago

Hi Damian —

To make sure I'm understanding properly, I think you are saying that reading i .. #L by S, your intuition is that it will contain L elements? And since it doesn't, you're proposing that the compiler / linter rule would look for expressions following the x..#y by z pattern and suggest the user might want to write it as x.. by z # y?

The by S clause is not a qualifier but a left associative operator with an argument.

You may already know this, and it's nearly moot for this discussion, but maybe not quite: note that #L is also an operator+argument in Chapel, like by S, such that 1..#L is not a range expression but the # operator applied to the unbounded range 1.. [see footnote]. You can also apply # to complete range, domain, or array, like (1..n)#L, myRange#L, or myArray#L. (lo..hi)#-4 or myRange#-4 can be a particularly useful expression, saying "give me the last 4 elements of the range (so, equivalent to hi-3..hi for the first case). Due to these negative operand cases, I tend not to think of # L as being "give me a thing of length L", but rather as a "take" or "restrict" operator, like "take the first four elements" or "take the last four elements".

Anyway, assuming I'm understanding correctly, my hesitation for adding an on-by-default linter rule or compiler warning for x..#y by z is that there are cases where it's a very useful expression—for example, to measure out a bounding box from a given point and then apply the stride to partition the resulting space. Maybe just because the superbowl has happened, I'm imagining that the ball is on the 'x' yard line, we want to measure the y=10 yards to the next first down and we want our video postprocessing to draw lines on the screen every z=5 yards, in which case x..#10 by 5 would be a very useful expression. Of course, as you say, in other cases, x.. by y #z can be very useful as well. But I guess neither seems inherently better/worse than the other to me without knowing the programmer's intention. That's what would make me disinclined to warn about such expressions by default (footnote2).

Do you buy that, or are you seeing some sort of pitfall I'm not?


footnote = in contrast, 1..<L is a range expression, but that's a different kettle of fish footnote2 = though if this were a common mistake that a programmer or community of programmers made, I wouldn't be opposed to providing them with an opt-in option to be warned about it.

damianmoz commented 7 months ago

Your understanding is correct.

Sorry - my explanation was unclear, I should have spelt out that that #L is a left associative operator and its argument, just like by S. Silly me.

While I appreciate the power of the # operator, some of us come from languages (or are avoid readers of books) where a subscript triplet is the only way to define/declare/use a range or a slice as in:

i : j : k // const R = i .. j by k
i # j : k // const R = i .. by k # j

The former have been in books on parallel programming since the late 1970s. Vector C (1984) users will come across to Chapel with both in their head. Users of Fortran on HPC since the 1980s will come across to Chapel with the former deeply etched in their psyche. Intel C/C++ users will unfortunately come across with a butchered version of the above that looks like the former but means the latter. So, for some of us, it is hard to unlearn that after 20+ or 30+ years of seeing such patterns, especially for cases which do not crop up ever day (or even every year) that one is programming in Chapel.

Such users will always make the mistake of thinking that a range declaration

const R = N .. by S #M align A;

is the same as a range definition.

Maybe after I have used Chapel for 30 years I will not, but by 2010+30, it probably won't matter as I won't be using Chapel by then.

Given that Chapel warns me about perfectly legal module declarations, I figured it could warn people about other suspect constructs of which I think #L can be. Such a task could also go into a separate static analyzer but that does not yet exist. I get lots of dumb messages from C/C++ lint-like tools but I am very happy for it to tell me about anything they think are me trying to do something which is a bit dodgey.

It is just a thought. It does not come up often but the effect of # if used wrongly can be catastrophic and take a awfully long while to identify.