Open bcmills opened 6 years ago
Come to think of it, I think this proposal is actually Go 1 compatible (except for actually removing range
). The proposed new forms are all invalid syntax today.
I think the time to fundamentally change how for-loops work has passed many years ago.
As a first qualifying question, are there correctness-preserving transforms between the old and new forms, such that if I write any old form the compiler cannot possibly get it wrong, and source rewriting (fix) can always create the new one?
Secondly, can I do exactly the same thing in my head when I inadvertantly think FOR 1=1,10,1
, that will give me the proper form (;-))
are there correctness-preserving transforms between the old and new forms, such that if I write any old form the compiler cannot possibly get it wrong, and source rewriting (fix) can always create the new one?
Yes. The general form of the rewrite is:
for InitStmt; Condition; PostStmt {
StatementList1
if Expression {
continue
}
StatementList2
}
to
{
InitStmt
for Condition {
StatementList1
if Expression {
PostStmt
continue
}
StatementList2
PostStmt
}
}
That general form can often be simplified.
InitStmt
, Condition
, and PostStmt
form an interval, interval
can be used instead.InitStmt
doesn't shadow existing variables, the outer scope block can be removed.PostStmt
is long compared to StatementList2
, the if Expression
can be inverted and the continue
dropped:
StatementList1
if !Expression {
StatementList2
}
PostStmt
can I do exactly the same thing in my head
That I can't answer. :)
How will something like this look?
// From crypto/tls/tls_test.go
for delay := time.Millisecond; delay <= 64*time.Millisecond; delay *= 2 {
if err = testConnReadNonzeroAndEOF(t, delay); err == nil {
return
}
}
@TocarIP I would consider the larger block
and write it in the two-part form:
delay := time.Millisecond
for err := testConnReadNonzeroAndEOF(t, delay); err != nil && delay < 64 * time.Millisecond {
delay *= 2
} else if err != nil {
t.Error(err)
}
That has a number of benefits:
Condition
position of the for
statement.
return
from the loop body, which makes it compose better with other code.I would argue that it also reads more like native language. “For each call to testConnReadNonzeroAndEOF
that returns an error for a delay less than 64ms, multiply the delay by 2. If the final attempt returns an error, fail the test.”
You could also write the loop as:
delay := time.Millisecond
for err := testConnReadNonzeroAndEOF(t, delay); err != nil {
if delay >= 64 * time.Millisecond {
t.Fatal(err)
}
delay *= 2
}
Which would read as, “For each unsuccessful attempt at testConnReadNonzeroAndEOF
, fail the test if the delay was at least 64ms, otherwise double the delay.”
See also https://lwn.net/Articles/557073/:
This issue of expressions excluding assignments has clearly not gone unnoticed by the Go designers. If we look at the if and switch statements, we see that, while they can be given a simple expression, they can also be given a simple statement as well, such as:
if i := strings.Index(s, ":"); i >= 0 {
which includes both an assignment and a test. This would work quite nicely for the readLine loop:
while line, ok := file.readLine(); ok {
Why can't range
(or range
's replacement) return one or two things like checking for an item in a map
? I rarely, if ever, use the index when using range
.
for item := range items {}
for item, i := range items {}
for item : items {}
for item, i : items {}
Python and JavaScript both do this more "ergonomically":
for item in items: pass
for i, item in enumerate(items): pass
for i in range(len(items)): pass
items.forEach(item => {})
items.forEach((item, i) => {});
@wesalvaro
Why can't
range
(orrange
's replacement) return one or two things like checking for an item in amap
?
The first value in a map lookup is always the element, because the caller already has the key. The second value in a map lookup is the present
boolean, which in a range
loop would always be true
. While it's tempting to make an argument from consistency, it seems to me that range
and map lookups are inherently different operations.
(That's part of why I find the := range
syntax awkward: x := m[k]
and x := range m
are very different operations, despite the similar syntax.)
I rarely, if ever, use the index when using range.
Yeah, I pretty much never use it either.
I can see two (mutually contradictory) arguments:
Argument (1) favors using indices for both (or elements for both), whereas argument (2) favors indices for maps and elements for slices and arrays. Personally, I am partial to argument (2), as are you. But there are enough folks who favor argument (1) that there isn't a consensus that is obvious to me. So for this proposal I chose argument (3):
I think ~everybody understands that when there are two variables they correspond to the index and element, respectively, and the leading _,
is only three extra characters. That seems a small price to pay to avoid ambiguity.
@bcmills
I think ~everybody understands that when there are two variables they correspond to the index and element, respectively
I constantly mix them up. And it's easy to see why - until you memorise it there's no way of telling which is which.
An
else
block may follow a loop that has with a Condition. Control transfers to the else block when the condition is false (likeelse
in Python loops). The variables declared in the first part of the two-part form remain in scope for theelse
block.
I almost never see for/else used in Python, for what it is worth.
I can not speak for everyone, but I intuitively expected (when I learned it was valid syntax at all) for the else
clause in a for
(or while
in Python) to run if and only if the condition was never true. In other words, run only if there are no iterations of the loop. That is actually a pattern I want all the time. (This is actually how jinja does it despite its following Python reasonably carefully in most of its keywords.)
I love the idea of making a loop that supports custom iterators better. It drives me crazy that there are many variants of iterators in Go. Even the standard library has at least two different variants. I think of them as io.Reader
style, where you need to check the return value for a sentinel like io.EOF
(see the example in the description), or sql.Rows
/bufio.Scanner
style, where you call Next()
in the for
condition, then call something else in the loop body to get the value (this is slightly different than the example in the original description, but effectively works the same way):
rows, err := db.Query("SELECT ...")
for rows.Next() {
err = rows.Scan(...)
...
}
if rows.Err() {
...
}
I would love to see a change (like this one) that would make using these iterators look similar. Even better, with language support, I suspect it would make people gravitate towards the one that works the most "naturally," which would encourage some standardization in the Go ecosystem.
I'm not sure I like the specific syntax proposed, but I very much agree with @evanj that making Go regular and easy to remember is a good thing.
While I think it's true that this could be backward compatible by keeping the existing formats, I don't think we want both the two-part for loop form and the three-part form. That seems potentially quite confusing.
In https://blog.golang.org/toward-go2 Russ says "To minimize disruption, each change will require careful thought, planning, and tooling, which in turn limits the number of changes we can make. Maybe we can do two or three, certainly not more than five."
These loop forms may well be better, but are they enough better? Do they solve any serious problems we have today? This would be a major change. Should it be one of the few major changes we do for Go 2?
I don't think this change would meet the Go 2 bar in isolation. I mainly intend it as a companion to #20733, which I believe does meet that bar.
So I would expect the decision for this proposal to depend upon that one. If we do not intend to address #20733 in Go 2, or if we intend to address it without a breaking change (that is, by changing the semantics of existing code in-place), we should probably reject this proposal.
However, if we are already making an incompatible change to for
-loops for #20733, I think we should consider this change in combination. That is: if users will need to go fix
their loops anyway, then perhaps the value of this change will outweigh its incremental cost relative to that.
I assert (without having done the code analysis to prove it) that the only code that will be affected by addressing #20733 is code that is buggy today and will be fixed by the change. I have never seen someone intentionally depend on the current behavior. I've lost track of the number of bugs I've seen caused by it.
@neild
the only code that will be affected by addressing #20733 is code that is buggy today and will be fixed by the change.
You may well be right, but I recall that @ianlancetaylor previously expressed (I think in an in-person discussion?) that, in general, breaking the old syntax can help to educate users about changes in semantics.
For example, if we make the change in-place, many people (and tools) will continue to write the old-style workaround out of habit:
for k, v := range m {
go func(k Key, v Value) {
…
}(k, v)
}
or
for k, v := range m {
k, v := k, v
go func() {
…
}()
}
On the other hand, if the syntax changes, the compilation error may prompt them to reconsider the loop as a whole, and to omit the redundant function arguments or variables:
for k, v : m {
go func() {
…
}()
}
So I see the cost/benefit tradeoff of a breaking change as:
Costs
go fix
existing code, which may or may not be necessary anyway (depending on what else goes into Go 2).Benefits
It's up to the proposal committee to weigh those costs and benefits; the balance is not obvious to me. The point of this proposal is to lay out some benefits on the “breaking change” side that might not be obvious otherwise.
An else block may follow a loop that has with a Condition. Control transfers to the else block when the condition is false (like else in Python loops). The variables declared in the first part of the two-part form remain in scope for the else block.
Unless I'm misreading this, this is not what an else
on a loop does in Python. The else
is executed if the loop reaches completion without the use of a break
. In other words:
for i in range(5):
break
else:
print('This never runs.')
for i in range(5):
print(i)
else:
print('This does run.')
As @lukesneeringer pointed out above, this is very confusing and rarely useful, often winding up a bit spaghetti-ish when used. I know that, like them, I originally thought it was a 'run if the condition wasn't true upon entering the loop', and was really disappointed to find that it wasn't. That's a feature I've wanted in a language for a while, as I think I'd use it quite often. It can be done manually pretty easily with an extra variable and an if
, but I do think it would be nice to have anyways.
If I'm reading your proposal correctly, you're saying basically that the else
block would simply be a place to get access to the variables initialized in the loop condition outside of the main loop body. In other words, it always runs after the loop, but the scoping is slightly different for it than just 'code that isn't in the loop'. Is that right?
@DeedleFake I don't think that applies? I'm specifically only proposing the else
for loops that have a Condition, and range
loops do not.
(And control only transfers to the else
when the condition is false, so it would not be executed if the loop terminates in a break
. However, in my experience it is rare for loops involving explicit conditions to also use break
.)
So you're proposing that it work exactly like Python, but restricted to loops with conditions. I think that it would work a lot better the way @lukesneeringer proposed, namely that the else
block runs only if the main loop body doesn't get run a single time.
However, in my experience it is rare for loops involving explicit conditions to also use break.
I use break
in loops with conditions all the time, probably just as often as loops without them. I'm not sure why it would make a difference.
I think that it would work a lot better the way @lukesneeringer proposed, namely that the
else
block runs only if the main loop body doesn't get run a single time.
The point of the else
block in this proposal is to provide a clean opportunity to return in case of an error in the two-part form.
Take a look at the examples in the proposal and comments above that use it. How would you write those with @lukesneeringer's formulation of the else
?
Can you give some concrete examples where his formulation would be clearer or more useful?
The point of the else block in this proposal is to provide a clean opportunity to return in case of an error in the two-part form. ... How would you write those with @lukesneeringer's formulation of the else?
You could just do it the way you do it now: Stick the whole thing in an infinite loop and use a break
. For example:
for {
n, err := r.Read(buf)
if err != nil {
break
}
}
Can you give some concrete examples where his formulation would be clearer or more useful?
for tok := s.Scan(); tok != nil {
fmt.Printf("Got token: %#v\n", tok)
} else {
return errors.New("Scanner yielded no tokens")
}
The only other way to do this (generally) is by adding a new variable outside the loop, setting it to a signal value during the loop, and then using a separate if
after the loop.
@DeedleFake
You could just do it the way you do it now: Stick the whole thing in an infinite loop and use a
break
.
Well, sure, but the point of this proposal is to make the typical cases simpler and more concise: frequency matters. So how often do you encounter read-until-error loops vs. read-or-something loops, and how much worse are they both in practice?
Can you give some concrete examples where his formulation would be clearer or more useful?
[…] The only other way to do this (generally) is by adding a new variable outside the loop
Could you give some examples of this pattern in real code? That one seems a bit contrived.
In most real code, I'd expect to see some sort of data structure being populated, and then a len
check on that data structure suffices: you don't need a new variable since you can check an existing one.
The lack of an else
for that use-case is only really a problem when the loop both requires a special-case for zero iterations and doesn't already have some other thing you can count, and that combination is rare enough that I'm having trouble finding any actual occurrences.
I can try to find some actual examples in code, but one area I can think of is printing representations of data structures. For example, if you had some kind of tree structure, you could do the following:
func (n *Node) print(w io.Writer, depth int) {
fmt.Fprintf(w, "%v%v:\n", strings.Repeat(" ", depth), n.Name)
for _, c : n.Children {
c.print(w, depth + 1)
} else {
fmt.Fprintf(w, "%vNo children.", strings.Repeat(" ", depth))
}
}
That's a nice example, but it's still fairly straightforward without the else
because there is an existing variable whose length we can check:
func (n *Node) print(w io.Writer, depth int) {
fmt.Fprintf(w, "%v%v:\n", strings.Repeat(" ", depth), n.Name)
for _, c : n.Children {
c.print(w, depth + 1)
}
if len(n.Children) == 0 {
fmt.Fprintf(w, "%vNo children.", strings.Repeat(" ", depth))
}
}
What if there were a foreach
keyword that used the two clause format? That way, there would be no confusion about two v. three clauses. Foreach could also do the new-scope thing that we all want but can't introduce without subtly breaking old code.
foreach item, ok := list.Next(); ok {
go func(){ dothing(item) }()
}
@carlmjohnson, I agree that a different keyword for the two-part form is a good idea. I would spell it while
. Programmers from other languages with both if
and while
will appreciate the commonality:
if EXPR | if STMT; EXPR
while EXPR | while STMT; EXPR
I think if two part while was added there would need to be one expression while as well to fend off a lot of people being confused that it doesn’t exist. But then there would be two ways to spell the same thing (for vs while), which is less than ideal. while
is a good name though, so maybe it’s worth it.
Motivation
Go's for-loops encourage difficult-to-read code.
The Go 1 loop syntax sets the wrong defaults. The syntax is optimized for three-part
ForClause
loops, butrange
loops are far more common (by a ratio of nearly 4:1 in the code I sampled) and arguably ought to be viewed as the “default”.The three-part
ForClause
form is nearly always used for iterating one variable over sequential integers. That puts the interesting part — the condition — in the middle, where it is the hardest to find.(For the rare other cases, it is always possible to express a three-part
ForClause
as an equivalent one-partForClause
with an extra scope block. Loops that usecontinue
require care, butcontinue
in a three-part non-integer loop is especially rare.)Nothing else in the language has a three-part form, and the existence of the three-part
for
loop precludes a more useful two-part alternative (for APIs such asio.Reader
), because it would be too easy to confuse a two-part loop with a three-part one.The
range
keyword is confusing to newcomers.In set theory, range means "image" or "codomain", but the single-value version of a Go 1
range
loop instead iterates over the domain of the slice, map, or array. That makes the single-value form confusing, especially when the index and element types are mutually assignable (https://play.golang.org/p/c-lWoTI_Z-Y) or when the value is used as aninterface{}
(https://play.golang.org/p/cqZPSHZtuwH).In some other programming languages (such as Python),
range
refers to a sequence of points in a numerical interval, evoking line segment range or statistical range. In contrast, the Gorange
keyword doesn't have anything to do with numerical intervals, except to the extent that slice indices happen to be intervals.The fact that
range
modifies the semantics of:=
and=
is surprising. The only other Go operator that modifies the semantics of another operator is=
itself, which (beyond the, ok
idiom) modifies the semantics of the index operator ([]
) for map assignments. (I think we should fix that too; see https://github.com/golang/go/issues/20660#issuecomment-367781440.)It is rarely useful to have a
range
loop assign to existing variables, and we could address that use-case more cleanly with afinally
orelse
keyword anyway.Eliminating the
range
keyword would allow us to fix variable capture (https://github.com/golang/go/issues/20733) in a way that does not unexpectedly change the semantics offor
-loops written in the Go 1 style. (That is, old-style loops would no longer compile, instead of successfully compiling to something different from before.)Proposal
Remove the
range
keyword and the three-part loop form.Make the
range
form of thefor
loop more concise, and add a two-part form and optionalelse
block.For the one-part form:
If the first part is of the form
x : z
orx, y : z
, it introduces new variablesx
andy
(as applicable), which take the value of each successive element ofz
. The one-variable form can be used only for channels and numeric intervals (seeinterval
below). The two-variable form can be used only for maps, slices, strings, and arrays.Otherwise, the first part must be a boolean expression and specifies the
Condition
of the loop.The new two-part form parallels the two-part form of
switch
. The first part is an arbitrary statement (usually a short variable declaration) to be evaluated before every iteration, and the second part is theCondition
:for x, err := f(); err == nil {
An
else
block may follow a loop that has with aCondition
. Control transfers to theelse
block when the condition is false (likeelse
in Python loops). The variables declared in the first part of the two-part form remain in scope for theelse
block.else
reads, we could drop that part entirely, or use some other keyword — such asfinally
— and/or tweak the semantics, for example by also transferring control to the block in case of abreak
.)Add a built-in pseudofunction
interval
to replace the vast majority of existing 3-part loops.interval(m, n)
returns a container that iterates over[m, n)
by increments of1
.interval(m, n, step)
returns a container that iterates fromm
(inclusive) ton
(exclusive) bystep
.Examples
Simple conditions
Loops with just a
Condition
remain the same as in Go 1.Ranges
Range loops lose a little bit of boilerplate, and gain a closer resemblance to for-each loops in other languages with C-like syntax (such as C++ and Java).
becomes
(from https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating)
becomes
Intervals
Simple numeric intervals move the limit closer to the end of the line (where it is easier to find), and in some cases drop the need for redundant variables.
becomes
becomes
(from https://github.com/golang/go/wiki/SliceTricks#reversing, noting that the original goes out of its way — and loses some clarity in the process — to avoid re-evaluating
len(a)/2
at each iteration)becomes
or
Iterators
Iterator patterns shed boilerplate and/or levels of indentation.
becomes
becomes
Lists
Loops iterating over certain custom containers (such as linked lists) become a bit more awkward. (On the other hand, I would argue that they were awkward to begin with — and they could be fixed by a further change to allow types to implement the
range
-like behavior directly.)becomes