golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.14k stars 17.69k forks source link

proposal: spec: improve for-loop ergonomics #24282

Open bcmills opened 6 years ago

bcmills commented 6 years ago

Motivation

Go's for-loops encourage difficult-to-read code.

  1. The Go 1 loop syntax sets the wrong defaults. The syntax is optimized for three-part ForClause loops, but range loops are far more common (by a ratio of nearly 4:1 in the code I sampled) and arguably ought to be viewed as the “default”.

    • The three-part ForClause form is nearly always used for iterating one variable over sequential integers. That puts the interesting part — the condition — in the middle, where it is the hardest to find.

      (For the rare other cases, it is always possible to express a three-part ForClause as an equivalent one-part ForClause with an extra scope block. Loops that use continue require care, but continue in a three-part non-integer loop is especially rare.)

    • Nothing else in the language has a three-part form, and the existence of the three-part for loop precludes a more useful two-part alternative (for APIs such as io.Reader), because it would be too easy to confuse a two-part loop with a three-part one.

  2. The range keyword is confusing to newcomers.

    • In set theory, range means "image" or "codomain", but the single-value version of a Go 1 range loop instead iterates over the domain of the slice, map, or array. That makes the single-value form confusing, especially when the index and element types are mutually assignable (https://play.golang.org/p/c-lWoTI_Z-Y) or when the value is used as an interface{} (https://play.golang.org/p/cqZPSHZtuwH).

    • In some other programming languages (such as Python), range refers to a sequence of points in a numerical interval, evoking line segment range or statistical range. In contrast, the Go range keyword doesn't have anything to do with numerical intervals, except to the extent that slice indices happen to be intervals.

    • The fact that range modifies the semantics of := and = is surprising. The only other Go operator that modifies the semantics of another operator is = itself, which (beyond the , ok idiom) modifies the semantics of the index operator ([]) for map assignments. (I think we should fix that too; see https://github.com/golang/go/issues/20660#issuecomment-367781440.)

      It is rarely useful to have a range loop assign to existing variables, and we could address that use-case more cleanly with a finally or else keyword anyway.

  3. Eliminating the range keyword would allow us to fix variable capture (https://github.com/golang/go/issues/20733) in a way that does not unexpectedly change the semantics of for-loops written in the Go 1 style. (That is, old-style loops would no longer compile, instead of successfully compiling to something different from before.)


Proposal

  1. Remove the range keyword and the three-part loop form.

  2. Make the range form of the for loop more concise, and add a two-part form and optional else block.

    • For the one-part form:

      • If the first part is of the form x : z or x, y : z, it introduces new variables x and y (as applicable), which take the value of each successive element of z. The one-variable form can be used only for channels and numeric intervals (see interval below). The two-variable form can be used only for maps, slices, strings, and arrays.

      • Otherwise, the first part must be a boolean expression and specifies the Condition of the loop.

    • The new two-part form parallels the two-part form of switch. The first part is an arbitrary statement (usually a short variable declaration) to be evaluated before every iteration, and the second part is the Condition:

      for x, err := f(); err == nil {

    • An else block may follow a loop that has with a Condition. Control transfers to the else block when the condition is false (like else in Python loops). The variables declared in the first part of the two-part form remain in scope for the else block.

      • (If we don't like the way else reads, we could drop that part entirely, or use some other keyword — such as finally — and/or tweak the semantics, for example by also transferring control to the block in case of a break.)
  3. Add a built-in pseudofunction interval to replace the vast majority of existing 3-part loops.

    • interval(m, n) returns a container that iterates over [m, n) by increments of 1.

    • interval(m, n, step) returns a container that iterates from m (inclusive) to n (exclusive) by step.


Examples

Simple conditions

Loops with just a Condition remain the same as in Go 1.

for a < b {
    a *= 2
}

for len(h) > 0 {
    x := heap.Pop(h)
    f(x.(someType))
}

Ranges

Range loops lose a little bit of boilerplate, and gain a closer resemblance to for-each loops in other languages with C-like syntax (such as C++ and Java).

for i, s := range a {
    g(i, s)
}

becomes

for i, s : a {
    g(i, s)
}

(from https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating)

b := a[:0]
for _, x := range a {
    if f(x) {
        b = append(b, x)
    }
}

becomes

b := a[:0]
for _, x : a {
    if f(x) {
        b = append(b, x)
    }
}

Intervals

Simple numeric intervals move the limit closer to the end of the line (where it is easier to find), and in some cases drop the need for redundant variables.

for i, n := 0, f(x); i < n; i++ {
    g(i)
}

becomes

for i : interval(0, f(x)) {
    g(i)
}

for n := runtime.GOMAXPROCS(0); n > 0; n-- {
    go …
}

becomes

for n : interval(runtime.GOMAXPROCS(0), 0, -1) {
    go …
}

(from https://github.com/golang/go/wiki/SliceTricks#reversing, noting that the original goes out of its way — and loses some clarity in the process — to avoid re-evaluating len(a)/2 at each iteration)

for i := len(a)/2-1; i >= 0; i-- {
    opp := len(a)-1-i
    a[i], a[opp] = a[opp], a[i]
}

becomes

for i : interval(0, len(a)/2) {
    opp := len(a)-1-i
    a[i], a[opp] = a[opp], a[i] 
}

or

for i, _ : a[:len(a)/2] {
    opp := len(a)-1-i
    a[i], a[opp] = a[opp], a[i] 
}

Iterators

Iterator patterns shed boilerplate and/or levels of indentation.

for {
    n, err := r.Read(p)
    if err != nil {
        if err != io.EOF {
            return err
        }
        return nil
    }
    f(p[:n])
}

becomes

for n, err := r.Read(p); err == nil {
    f(p[:n])
} else if err != io.EOF {
    return err
}
return nil

iter := Begin()
for x, ok := iter.Next(); ok; x, ok = iter.Next() {
    f(x)
}

becomes

iter := Begin()
for x, ok := iter.Next(); ok {
    f(x)
}

Lists

Loops iterating over certain custom containers (such as linked lists) become a bit more awkward. (On the other hand, I would argue that they were awkward to begin with — and they could be fixed by a further change to allow types to implement the range-like behavior directly.)

for e := l.Front(); e != nil; e = e.Next() {
    f(e)
}

becomes

e := l.Front()
for e != nil {
    f(e)
    e = e.Next()
}
bcmills commented 6 years ago

Come to think of it, I think this proposal is actually Go 1 compatible (except for actually removing range). The proposed new forms are all invalid syntax today.

slrz commented 6 years ago

I think the time to fundamentally change how for-loops work has passed many years ago.

davecb commented 6 years ago

As a first qualifying question, are there correctness-preserving transforms between the old and new forms, such that if I write any old form the compiler cannot possibly get it wrong, and source rewriting (fix) can always create the new one? Secondly, can I do exactly the same thing in my head when I inadvertantly think FOR 1=1,10,1, that will give me the proper form (;-))

bcmills commented 6 years ago

are there correctness-preserving transforms between the old and new forms, such that if I write any old form the compiler cannot possibly get it wrong, and source rewriting (fix) can always create the new one?

Yes. The general form of the rewrite is:

for InitStmt; Condition; PostStmt {
    StatementList1
    if Expression {
        continue
    }
    StatementList2
}

to

{
    InitStmt
    for Condition {
        StatementList1
        if Expression {
            PostStmt
            continue
        }
        StatementList2
        PostStmt
    }
}

That general form can often be simplified.


can I do exactly the same thing in my head

That I can't answer. :)

TocarIP commented 6 years ago

How will something like this look?

// From crypto/tls/tls_test.go
 for delay := time.Millisecond; delay <= 64*time.Millisecond; delay *= 2 {
                if err = testConnReadNonzeroAndEOF(t, delay); err == nil {
                        return
                }
        }
bcmills commented 6 years ago

@TocarIP I would consider the larger block

https://github.com/golang/go/blob/010579c2377654e575d46857ec7dc77bab586438/src/crypto/tls/tls_test.go#L222-L228

and write it in the two-part form:

delay := time.Millisecond
for err := testConnReadNonzeroAndEOF(t, delay); err != nil && delay < 64 * time.Millisecond {
    delay *= 2
} else if err != nil {
    t.Error(err)
}

That has a number of benefits:

I would argue that it also reads more like native language. “For each call to testConnReadNonzeroAndEOF that returns an error for a delay less than 64ms, multiply the delay by 2. If the final attempt returns an error, fail the test.”

You could also write the loop as:

delay := time.Millisecond
for err := testConnReadNonzeroAndEOF(t, delay); err != nil {
    if delay >= 64 * time.Millisecond {
        t.Fatal(err)
    }
    delay *= 2
}

Which would read as, “For each unsuccessful attempt at testConnReadNonzeroAndEOF, fail the test if the delay was at least 64ms, otherwise double the delay.”

bcmills commented 6 years ago

See also https://lwn.net/Articles/557073/:

This issue of expressions excluding assignments has clearly not gone unnoticed by the Go designers. If we look at the if and switch statements, we see that, while they can be given a simple expression, they can also be given a simple statement as well, such as:

  if i := strings.Index(s, ":"); i >= 0 {

which includes both an assignment and a test. This would work quite nicely for the readLine loop:

  while line, ok := file.readLine(); ok {
hooluupog commented 6 years ago

IMHO,if this proposal were accepted, #21263 should also be reconsidered. FYI,Swift has already implemented these features in 3.0(SE-0004, SE-0007).

wesalvaro commented 6 years ago

Why can't range (or range's replacement) return one or two things like checking for an item in a map? I rarely, if ever, use the index when using range.

for item := range items {}
for item, i := range items {}
for item : items {}
for item, i : items {}

Python and JavaScript both do this more "ergonomically":

for item in items: pass
for i, item in enumerate(items): pass
for i in range(len(items)): pass
items.forEach(item => {})
items.forEach((item, i) => {});
bcmills commented 6 years ago

@wesalvaro

Why can't range (or range's replacement) return one or two things like checking for an item in a map?

The first value in a map lookup is always the element, because the caller already has the key. The second value in a map lookup is the present boolean, which in a range loop would always be true. While it's tempting to make an argument from consistency, it seems to me that range and map lookups are inherently different operations.

(That's part of why I find the := range syntax awkward: x := m[k] and x := range m are very different operations, despite the similar syntax.)

I rarely, if ever, use the index when using range.

Yeah, I pretty much never use it either.

I can see two (mutually contradictory) arguments:

  1. Loops over maps should be consistent with loops over slices and arrays.
  2. Loops over single items should iterate over the most commonly-useful part.

Argument (1) favors using indices for both (or elements for both), whereas argument (2) favors indices for maps and elements for slices and arrays. Personally, I am partial to argument (2), as are you. But there are enough folks who favor argument (1) that there isn't a consensus that is obvious to me. So for this proposal I chose argument (3):

  1. There is no choice that is “obvious” to everyone, so always be explicit.

I think ~everybody understands that when there are two variables they correspond to the index and element, respectively, and the leading _, is only three extra characters. That seems a small price to pay to avoid ambiguity.

creker commented 6 years ago

@bcmills

I think ~everybody understands that when there are two variables they correspond to the index and element, respectively

I constantly mix them up. And it's easy to see why - until you memorise it there's no way of telling which is which.

lukesneeringer commented 6 years ago

An else block may follow a loop that has with a Condition. Control transfers to the else block when the condition is false (like else in Python loops). The variables declared in the first part of the two-part form remain in scope for the else block.

I almost never see for/else used in Python, for what it is worth.

I can not speak for everyone, but I intuitively expected (when I learned it was valid syntax at all) for the else clause in a for (or while in Python) to run if and only if the condition was never true. In other words, run only if there are no iterations of the loop. That is actually a pattern I want all the time. (This is actually how jinja does it despite its following Python reasonably carefully in most of its keywords.)

evanj commented 6 years ago

I love the idea of making a loop that supports custom iterators better. It drives me crazy that there are many variants of iterators in Go. Even the standard library has at least two different variants. I think of them as io.Reader style, where you need to check the return value for a sentinel like io.EOF (see the example in the description), or sql.Rows/bufio.Scanner style, where you call Next() in the for condition, then call something else in the loop body to get the value (this is slightly different than the example in the original description, but effectively works the same way):

rows, err := db.Query("SELECT ...")
for rows.Next() {
    err = rows.Scan(...)
    ...
}
if rows.Err() {
    ...
}

I would love to see a change (like this one) that would make using these iterators look similar. Even better, with language support, I suspect it would make people gravitate towards the one that works the most "naturally," which would encourage some standardization in the Go ecosystem.

davecb commented 6 years ago

I'm not sure I like the specific syntax proposed, but I very much agree with @evanj that making Go regular and easy to remember is a good thing.

ianlancetaylor commented 6 years ago

While I think it's true that this could be backward compatible by keeping the existing formats, I don't think we want both the two-part for loop form and the three-part form. That seems potentially quite confusing.

In https://blog.golang.org/toward-go2 Russ says "To minimize disruption, each change will require careful thought, planning, and tooling, which in turn limits the number of changes we can make. Maybe we can do two or three, certainly not more than five."

These loop forms may well be better, but are they enough better? Do they solve any serious problems we have today? This would be a major change. Should it be one of the few major changes we do for Go 2?

bcmills commented 6 years ago

I don't think this change would meet the Go 2 bar in isolation. I mainly intend it as a companion to #20733, which I believe does meet that bar.

So I would expect the decision for this proposal to depend upon that one. If we do not intend to address #20733 in Go 2, or if we intend to address it without a breaking change (that is, by changing the semantics of existing code in-place), we should probably reject this proposal.

However, if we are already making an incompatible change to for-loops for #20733, I think we should consider this change in combination. That is: if users will need to go fix their loops anyway, then perhaps the value of this change will outweigh its incremental cost relative to that.

neild commented 6 years ago

I assert (without having done the code analysis to prove it) that the only code that will be affected by addressing #20733 is code that is buggy today and will be fixed by the change. I have never seen someone intentionally depend on the current behavior. I've lost track of the number of bugs I've seen caused by it.

bcmills commented 6 years ago

@neild

the only code that will be affected by addressing #20733 is code that is buggy today and will be fixed by the change.

You may well be right, but I recall that @ianlancetaylor previously expressed (I think in an in-person discussion?) that, in general, breaking the old syntax can help to educate users about changes in semantics.

For example, if we make the change in-place, many people (and tools) will continue to write the old-style workaround out of habit:

for k, v := range m {
    go func(k Key, v Value) {
        …
    }(k, v)
}

or

for k, v := range m {
    k, v := k, v
    go func() {
        …
    }()
}

On the other hand, if the syntax changes, the compilation error may prompt them to reconsider the loop as a whole, and to omit the redundant function arguments or variables:

for k, v : m {
    go func() {
        …
    }()
}

So I see the cost/benefit tradeoff of a breaking change as:

Costs

Benefits

It's up to the proposal committee to weigh those costs and benefits; the balance is not obvious to me. The point of this proposal is to lay out some benefits on the “breaking change” side that might not be obvious otherwise.

DeedleFake commented 6 years ago

An else block may follow a loop that has with a Condition. Control transfers to the else block when the condition is false (like else in Python loops). The variables declared in the first part of the two-part form remain in scope for the else block.

Unless I'm misreading this, this is not what an else on a loop does in Python. The else is executed if the loop reaches completion without the use of a break. In other words:

for i in range(5):
  break
else:
  print('This never runs.')

for i in range(5):
  print(i)
else:
  print('This does run.')

As @lukesneeringer pointed out above, this is very confusing and rarely useful, often winding up a bit spaghetti-ish when used. I know that, like them, I originally thought it was a 'run if the condition wasn't true upon entering the loop', and was really disappointed to find that it wasn't. That's a feature I've wanted in a language for a while, as I think I'd use it quite often. It can be done manually pretty easily with an extra variable and an if, but I do think it would be nice to have anyways.

If I'm reading your proposal correctly, you're saying basically that the else block would simply be a place to get access to the variables initialized in the loop condition outside of the main loop body. In other words, it always runs after the loop, but the scoping is slightly different for it than just 'code that isn't in the loop'. Is that right?

bcmills commented 6 years ago

@DeedleFake I don't think that applies? I'm specifically only proposing the else for loops that have a Condition, and range loops do not.

bcmills commented 6 years ago

(And control only transfers to the else when the condition is false, so it would not be executed if the loop terminates in a break. However, in my experience it is rare for loops involving explicit conditions to also use break.)

DeedleFake commented 6 years ago

So you're proposing that it work exactly like Python, but restricted to loops with conditions. I think that it would work a lot better the way @lukesneeringer proposed, namely that the else block runs only if the main loop body doesn't get run a single time.

However, in my experience it is rare for loops involving explicit conditions to also use break.

I use break in loops with conditions all the time, probably just as often as loops without them. I'm not sure why it would make a difference.

bcmills commented 6 years ago

I think that it would work a lot better the way @lukesneeringer proposed, namely that the else block runs only if the main loop body doesn't get run a single time.

The point of the else block in this proposal is to provide a clean opportunity to return in case of an error in the two-part form.

Take a look at the examples in the proposal and comments above that use it. How would you write those with @lukesneeringer's formulation of the else?

Can you give some concrete examples where his formulation would be clearer or more useful?

DeedleFake commented 6 years ago

The point of the else block in this proposal is to provide a clean opportunity to return in case of an error in the two-part form. ... How would you write those with @lukesneeringer's formulation of the else?

You could just do it the way you do it now: Stick the whole thing in an infinite loop and use a break. For example:

for {
  n, err := r.Read(buf)
  if err != nil {
    break
  }
}

Can you give some concrete examples where his formulation would be clearer or more useful?

for tok := s.Scan(); tok != nil {
  fmt.Printf("Got token: %#v\n", tok)
} else {
  return errors.New("Scanner yielded no tokens")
}

The only other way to do this (generally) is by adding a new variable outside the loop, setting it to a signal value during the loop, and then using a separate if after the loop.

bcmills commented 6 years ago

@DeedleFake

You could just do it the way you do it now: Stick the whole thing in an infinite loop and use a break.

Well, sure, but the point of this proposal is to make the typical cases simpler and more concise: frequency matters. So how often do you encounter read-until-error loops vs. read-or-something loops, and how much worse are they both in practice?

Can you give some concrete examples where his formulation would be clearer or more useful?

[…] The only other way to do this (generally) is by adding a new variable outside the loop

Could you give some examples of this pattern in real code? That one seems a bit contrived.

In most real code, I'd expect to see some sort of data structure being populated, and then a len check on that data structure suffices: you don't need a new variable since you can check an existing one.

The lack of an else for that use-case is only really a problem when the loop both requires a special-case for zero iterations and doesn't already have some other thing you can count, and that combination is rare enough that I'm having trouble finding any actual occurrences.

DeedleFake commented 6 years ago

I can try to find some actual examples in code, but one area I can think of is printing representations of data structures. For example, if you had some kind of tree structure, you could do the following:

func (n *Node) print(w io.Writer, depth int) {
  fmt.Fprintf(w, "%v%v:\n", strings.Repeat("  ", depth), n.Name)
  for _, c : n.Children {
    c.print(w, depth + 1)
  } else {
    fmt.Fprintf(w, "%vNo children.", strings.Repeat("  ", depth))
  }
}
bcmills commented 6 years ago

That's a nice example, but it's still fairly straightforward without the else because there is an existing variable whose length we can check:

func (n *Node) print(w io.Writer, depth int) {
  fmt.Fprintf(w, "%v%v:\n", strings.Repeat("  ", depth), n.Name)
  for _, c : n.Children {
    c.print(w, depth + 1)
  }
  if len(n.Children) == 0 {
    fmt.Fprintf(w, "%vNo children.", strings.Repeat("  ", depth))
  }
}
earthboundkid commented 5 years ago

What if there were a foreach keyword that used the two clause format? That way, there would be no confusion about two v. three clauses. Foreach could also do the new-scope thing that we all want but can't introduce without subtly breaking old code.

foreach item, ok := list.Next(); ok {
   go func(){ dothing(item) }()
}
jba commented 2 years ago

@carlmjohnson, I agree that a different keyword for the two-part form is a good idea. I would spell it while. Programmers from other languages with both if and while will appreciate the commonality:

if EXPR      |   if STMT; EXPR 
while EXPR   |   while STMT; EXPR
earthboundkid commented 2 years ago

I think if two part while was added there would need to be one expression while as well to fend off a lot of people being confused that it doesn’t exist. But then there would be two ways to spell the same thing (for vs while), which is less than ideal. while is a good name though, so maybe it’s worth it.