chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

Make built-in Chapel features 0-based by default #12988

Closed bradcray closed 4 years ago

bradcray commented 5 years ago

As a Chapel user, I would like the language to use 0-based indexing by default for all built-in language constructs because (a) that's what I'm accustomed to from C, Java, Python, and other modern languages, (b) it makes modulo arithmetic easier, and (c) it's more consistent with how things tend to be counted in HPC contexts. I believe this would involve changing:

Of course, nothing about this proposal would preclude users from declaring their arrays using whatever indexing scheme they wanted (e.g., var A: [1..n, 1..n] real; var ratings: [-3..3] int;)

dlongnecke-cray commented 5 years ago

Most of the languages I've used in the past are zero based (C, Java, Python). Anecdotally speaking, one of the things that was hardest to get used to when learning Chapel was the switch to one-based indexing.

Having become accustomed to both, I don't think there's any particular reason to advocate using a particular indexing scheme over the other. Both have their upsides and downsides.

Which indexing scheme to favor seems closely tied to the question of target audience. Since Chapel is heavily invested in attracting Python programmers, zero-based indexing seems like a strong step towards achieving this goal. Consistency (insofar as possible) between Python and Chapel will make the transition from former to latter feel natural.

There's also the interop perspective to consider. Users that write libraries in C/C++ and call into them from Chapel will benefit from logical consistency and a reduced chance of out-of-bounds errors.

While I'd personally feel more comfortable with zero-based indexing, I'd be interested to see what others have to say for/against it.

lydia-duncan commented 5 years ago

I know we intentionally chose 1-based indexing (and were aware that many languages were 0-based at the time). What was the reasoning behind that? I strongly suspect at least some of them will still hold water, even if the overall conclusion does not, so think it would be good to have them listed here.

I don't necessarily have a horse in this race, but would point out that 1-based is more natural to people that may be learning about programming for the first time (and so aren't biased by coming from 0-based languages). One of our arguments is that Chapel should be used in teaching programming - on the one hand, because 0-based is so prevalent that might be an argument for introducing students to it through us so that they can then easily transition to other languages; on the other hand, if we conquer the world and use 1-based indexing, students will never have to change their thinking at all.

bradcray commented 5 years ago

The historical reasoning behind using 1-based indexing was that (a) it's a more natural numbering system for humans who are non-programmers / non-mathematical; (b) it's what Fortran and Matlab use (and we were trying to make choices that would attract and offend users of every language :) ).

mppf commented 5 years ago

When I first started using Chapel, I found the 1-based indexing surprising. And then I got used to it. Anyway, I agree that people are more likely to start with 1-based counting than 0-based, before programming.

If we were to move to 0-based, I'd imagine that we'd want to change a..b to not include b.

bradcray commented 5 years ago

If we were to move to 0-based, I'd imagine that we'd want to change a..b to not include b.

Why do you imagine that? I'm (reluctantly) open to making the change proposed in this issue, but I'm not at all interested in making this one.

mppf commented 5 years ago

Because ranges in 0-based languages (e.g. Python) work that way. When working with 0-based numbers, it usually makes more sense to have the upper bound not included. E.g. 0..tuple.size.

bradcray commented 5 years ago

I can't even begin to say how distasteful I find that interpretation relative to my intuitive real-life definition of what .. means. To suggest that the sequence of integer values -3 through 3 inclusive should be written -3..4 just makes my brain want to turn off. I'm not even sure what it would mean for a range of floating point values like -3.0..3.0.

Trying to be more constructive, though:

Does Python have syntactic sugar for ranges apart from range(lo,hi)? (Googling I'm not finding anything, but I'm not looking very hard either).

I don't personally correlate whether a language is 0- or 1-based with how situations are interpreted when both lower and upper bounds are specified. For example, Java's Apache library Range type uses inclusive bounds even though Java is 0-based. Swift seems to use 0-based indices for arrays by default yet interprets lo..hi as inclusive. I think the correlation is stronger in scenarios where only one bound (really a size) is specified like int x[5] in C meaning 0..4.

For the common case of ranges with a low bound of 0, we already have the # operator. For arbitrary low bounds we could introduce a lo..<hi syntax like Swift to get an upper-exclusive interval. We could extend it to lo>..<hi to be able to create open intervals on either side (historical trivia: for awhile Chapel proposed using [lo..hi) or (lo..hi] to express open intervals before we decided that it was just too confusing).

If it's both of these proposals or neither, I'm definitely strongly in the neither camp. I'm also a bit frustrated by the simultaneous demands from users to freeze the language ASAP while simultaneously re-opening every design decision that's been made since day 1.

BryantLam commented 5 years ago

To suggest that the sequence of integer values -3 through 3 inclusive should be written -3..4 just makes my brain want to turn off.

I really like inclusive ranges too! But I also sympathize with Michael's comment and why Python ranges are upper exclusive. That said...

For arbitrary low bounds we could introduce a lo..<hi syntax like Swift to get an upper-exclusive interval. We could extend it to lo>..<hi to be able to create open intervals on either side (historical trivia: for awhile Chapel proposed using [lo..hi) or (lo..hi] to express open intervals before we decided that it was just too confusing).

This is a hot idea! I love it. I'm not sure why it was deemed confusing in the historical context, but I hope it was due to choice of syntax instead of this more readily apparent alternative. Lower exclusive is likely infrequently used, but the symmetry of syntax is pleasing.

bradcray commented 5 years ago

I'm not sure why it was deemed confusing in the historical context, but I hope it was due to choice of syntax

Yeah, only due to the syntax (one more overload of square brackets and parentheses; confusing for syntax highlighters and editors to deal with; too "cute").

I like the Swift syntax too... It seems syntactically unambiguous and even almost intuitive even if you didn't know what it meant. I hadn't seen it before going to see what other languages do here today.

mppf commented 5 years ago

I don't think we should change Chapel tuples and strings to be 0-based. I think that starting by 1 combines nicely with the current range syntax. Brad's argument that these two don't necessarily need to be tied together is reasonable, but I also don't personally think it's worth it to migrate to 0-based. I think it will just change where Python programmers are confused (to, say, range declarations).

Either way, I think it'd be reasonable to have a way to construct an exclusive range.

BryantLam commented 5 years ago

For consistency, the alternative is to leave these items as 1-based, and additionally change the Locales array to be 1-based.

bradcray commented 5 years ago

I'm going to be contrary here just for the sake of keeping this discussion going (because while personally, I'm not deeply interested in changing anything here, I'm more open to changing some things, given good reason, than others):

I think that starting by 1 combines nicely with the current range syntax.

I don't really buy this as a reason to keep the status quo given the # operator and potential for adding an open interval range. Moreover, I think the preferable way to write loops over most of these cases for elegance / convenience / comprehension / efficiency is by iterating over them directly rather than by using a range + indexing, so I don't think we should link the decision to the interpretation of ranges.

But let me try a different tack: Which notable languages that we typically use as reference points in making recent Chapel decisions use 1-based indexing for these three cases (strings / for tuples / for default array lower bounds)? Michael ties this question to Python programmers specifically, but if we're really talking about "all modern programmers / programming languages" then I think focusing too much on Python is a mistake.

Based on some quick research, I think the answers are:

Strings Tuples Array Literals
Swift N/A / 0-based 0-based 0-based
Rust N/A / 0-based 0-based 0-based
Python 0-based 0-based 0-based
Go 0-based N/A 0-based
C++ 0-based 0-based 0-based
Java 0-based 0-based 0-based
C# 0-based 1-based 0-based
Matlab 1-based N/A 1-based
Fortran 1-based N/A 1-based

I'm including the last 2 entries not because I think they're deeply motivational for us today, but because I think they are a large part of why Chapel ended up where it did. At the time of its inception "I want a scalable Matlab" was the far more common cry from users than "I want a scalable Python", and three of the first four entries here weren't really on our radar (and Python wasn't as obviously a sure thing as it is today), so Fortran was a major touchpoint outside of C-based languages, so on almost equal footing, particularly within HPC.

Given this table, I think it's reasonable to conclude that we chose incorrectly w.r.t. modern sensibilities and should change now, if ever.

I think it will just change where Python programmers are confused (to, say, range declarations).

This is what made me ask (non-rhetorically) whether Python has a range syntax beyond the range(-3,3). Assuming not (I'm not finding an indication of one), while there's a risk that a Python programmer would blindly change range(-3,4) to -3..4, the different syntax is also a pretty strong cue that you shouldn't simply assume it'll behave the same, and the similarity to what I think people mean by .. in real life is also on our side as an indication that maybe they should think twice about it.

For consistency, the alternative is to leave these items as 1-based, and additionally change the Locales array to be 1-based.

I think that this proposal belongs in a new issue rather than this one, but since it's here and I'm arguing... The reason I disagree with it is that I don't believe Chapel says "your arrays should always be 1-based"; rather, it says "your arrays can use whatever bounds you want and think are most natural for the context" (which is why they always take lower and upper bounds).

For the Locales array, I think 0-based is the most natural choice because (a) the Locales array is systems-oriented by nature and reflects how systems and HPC programmers number compute nodes, and (b) in many cases that use the indices (e.g., in writing distributions), they're typically heavily used for modular arithmetic for which 0-based indexing is most natural.

BryantLam commented 5 years ago

Cross-response from https://github.com/chapel-lang/chapel/issues/13735#issuecomment-530011853:

@bradcray: My attitude about this continues to be that arrays should feel free use whatever indexing scheme is most natural for them.

"What comes natural" is subjective when baked into a language or its standard libraries and will cause confusion in trying to remember which arrays have which domain starts. I'd rather it be objectively all 0- or 1-based. My preference is for 0-based.

ben-albrecht commented 5 years ago

I personally have gotten used to 1-based indexing in Chapel, and quite enjoy using it when possible.

That said, I have to agree with @bradcray's comment after seeing the table of other modern languages above:

Given this table, I think it's reasonable to conclude that we chose incorrectly w.r.t. modern sensibilities and should change now, if ever.


This is what made me ask (non-rhetorically) whether Python has a range syntax beyond the range(-3,3). Assuming not (I'm not finding an indication of one), while there's a risk that a Python programmer would blindly change range(-3,4) to -3..4, the different syntax is also a pretty strong cue that you shouldn't simply assume it'll behave the same, and the similarity to what I think people mean by .. in real life is also on our side as an indication that maybe they should think twice about it.

Python does have a slicing syntax similar to Chapel's ranges: A[1:3] in python would correspond to a slice of A[1..2] in Chapel (assuming 0-based indices for Chapel), and A[2:10:2] in Python would be the same as A[2..9 by 2] in Chapel (again, assuming 0-based indexing).

e-kayrakli commented 5 years ago

I am leaning towards switching to 0-based indexing overall. Only reason I am not strongly leaning is I can't really tell whether it is too late to do that change.

FWIW, things that I find bothersome to me the most:

About Chapel ranges vs Python ranges: I find the syntactic cue to be clear enough that b is inclusive in a..b. I am used to reading python ranges range(a, b) (and slices arr[a:b]) as "a is the first index included, b is the first index not included". It helps (me, at least) with funkier slices etc.

bradcray commented 5 years ago

Are Python tuples 1-based?

Are you saying your confusion here is because of using Chapel so much and needing to go back and forth? Or is there another reason to be confused about Python's tuple indexing? (I'm not sure we can change Python in any case... :) ).

I find the syntactic cue to be clear enough that b is inclusive in a..b

A few other people have mentioned concerns today about not wanting to reinterpret lo..hi as lo, lo+1, ... hi-1 so just to reassure: My read of the situation is that we're at the point where there's general agreement that, if we were to switch to 0-based indexing, we would rely on 0..#hi and likely also introduce new ..< >.. >..< open range intervals to handle these cases (e.g., 0..<hi).

e-kayrakli commented 5 years ago

Are you saying your confusion here is because of using Chapel so much and needing to go back and forth? Or is there another reason to be confused about Python's tuple indexing? (I'm not sure we can change Python in any case... :) ).

:) The first one. I am happy with what Python is doing. I am unhappy that I am getting used to Chapel's 1-based tuple indexing to the extent that I make mistakes when I write a Python script.

A few other people have mentioned concerns today about not wanting to reinterpret lo..hi as lo, lo+1, ... hi-1 so just to reassure: My read of the situation is that we're at the point where there's general agreement that, if we were to switch to 0-based indexing, we would rely on 0..#hi and likely also introduce new ..< >.. >..< open range intervals to handle these cases (e.g., 0..<hi).

I am neutral about open range intervals. Maybe they are more useful for someone with more mathematical background? i.e. I have no problem with adding/subtracting one as needed.

ty1027 commented 5 years ago

Please allow me to add some comments... (from a user side).


0- vs 1-based indexing (my impression about other languages)

First, although Brad mentioned above that

The historical reasoning behind using 1-based indexing was that (a) it's a more natural numbering system for humans who are non-programmers / non-mathematical; (b) ...

I think 1-based indexing is used rather often by math/(natural)science-oriented languages, like Mathematica (Wolfram language), Fortran, R, Matlab, Julia, etc. (where R is a popular language for statistics). The following page contains a more complete list:

https://en.wikipedia.org/wiki/Comparison_of_programming_languages_%28array%29#Array_system_cross-reference_list

0-indexing is used by C-family languages. Here I think choosing "0- vs 1-based" is rather related to "whether it is more oriented to general-purpose (or CS) languages or more toward 'math/(natural)science'.

Some related pages about 0 vs 1-based (the 1st page is more oriented to 1-based). https://craftofcoding.wordpress.com/2017/03/12/why-1-based-indexing-is-ok/ https://discourse.julialang.org/t/whats-the-big-deal-0-vs-1-based-indexing/1102

-- The point seems that: 1-based indexing is natural for counting, and 0-based indexing is more convenient for offset (or interval) calculations. -- Julia employs 1-based indexing because (I think) the target is mainly linear algebra. For example, "a * b" (where a and b are 2-dim arrays) means matrix multiplication by default, so I think it is strongly geared to linear algebra.

This page compares Python and Julia, and says that the potential weak point is 1-based indexing of Julia. https://www.infoworld.com/article/3241107/julia-vs-python-which-is-best-for-data-science.html Nevertheless, Julia is rather high in some recent ranking (cf. the link below), so I guess 0- vs 1-indexing is not the most critical factor to determine its use (although some people are very skeptical about the acceptance of 1-indexing, as mentioned in the above link). At the same time, I often see a lot of people complaining about such 1-indexing (on the net, sometimes even harshly), so the familiarity with a given indexing is probably not negligible either (I think...). https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2019


And my opinions... (about indexing of Chapel)

I often find that other people are using 0..#n, which seems natural if they use C/C++ or Python. So, to decrease the overhead of translation from (or interfacing with) the latter, I guess it would be more beneficial to make 0-based indexing as the default (*). I also feel that Chapel fits more naturally in the category of general-purpose languages rather than 'math/(natural)science-oriented (or even domain-specific(?)) languages'.

(*) But this opinion is completely ignoring (1) any cost of compiler/software modifications, which may be substantial, and (2) whether the cost meets the possible merit. Though I believe the ratio of people who likes 0-indexing will be generally much greater than the ratio who likes 1-indexing, I have no idea to what extent people (who interested in Chapel) care about such difference.

Another factor is that it may be very frustrating to use both 0- and 1-based indexing among different codes (e.g., when interfacing with C-family languages). This seems unnecessary overhead/burden if (for example) C++ and Python are to be used as primary "companion" languages.


Range syntax

Personally I like the current definition ("a..b" means "b inclusive"). This also seems to be the choice by Nim (which also uses "a..<b" for "b exclusive").

https://nim-lang.org/docs/tut1.html#advanced-types-arrays https://nim-lang.org/docs/tut1.html#advanced-types-slices https://nim-lang.org/docs/tut1.html#control-flow-statements-case-statement


Multi-dimensional arrays with arbitrary indexing

I think one of the most attractive features in Chapel is n-dim arrays with arbitrary indexing (e.g., a: [2..3, 4..5, 6..7] etc). Thanks to this, if someone prefers 1-based arrays, they can just define it so. Also, it allows an array like [-1.. n+1] (with a buffer region). Given this feature, I think even people more familiar with 1-based would have no problem with 0-indexing (for other parts) by default.

ty1027 commented 5 years ago

(Sorry, I rushed too much to write my first post and so I am afraid it is lengthy and the points are not clear... I rewrote the contents so that they become more clear (hopefully). If necessary, please refer to the updated version. Thanks much!)

bradcray commented 4 years ago

I think we've effectively done this now with Chapel 1.22. There are arguably still library interfaces that may want to be updated to use 0-based indexing (e.g., #15419), but I think for the sake of what this issue is meant to accomplish, we're done, so I'm closing it.