spec: less error-prone loop variable scoping

rsc commented 1 year ago

We propose to change for loop variables declared with := from one-instance-per-loop to one-instance-per-iteration. This change would apply only to packages in modules that explicitly declare a new enough Go version in the go.mod file, so all existing code is unaffected. Changing the loop semantics would prevent unintended sharing in per-iteration closures and goroutines (see this entry in the Go FAQ for one explanation). We expect this change to fix many subtly broken for loops in new code, as well as in old code as it is updated to the newer Go version. There was an earlier pre-proposal discussion of this idea at #56010.

For example, consider a loop like the one in this test:

func TestAllEven(t *testing.T) {
    testCases := []int{0, 2, 4, 6}
    for _, v := range testCases {
        t.Run("sub", func(t *testing.T) {
            t.Parallel()
            if v&1 != 0 {
                t.Fatal("odd v", v)
            }
        })
    }
}

This test aims to check that all the test cases are even, but it doesn't check them all. The problem is that t.Parallel stops the closure and lets the loop continue, and then it runs all the closures in parallel when the loop is over and TestAllEven has returned. By the time the if statement in the closure executes, the loop is done, and v has its final iteration value, 6. All four subtests now continue executing in parallel, and they all check that 6 is even, instead of checking each of the test cases.

Most Go developers are familiar with this mistake and know the answer: add v := v to the loop body:

func TestAllEven(t *testing.T) {
    testCases := []int{0, 2, 4, 6}
    for _, v := range testCases {
        v := v // MAKE ITERATION VALUE SHARING BUG GO AWAY
        t.Run("sub", func(t *testing.T) {
            t.Parallel()
            if v&1 != 0 {
                t.Fatal("odd v", v)
            }
        })
    }
}

Changing the loop semantics would in essence insert this kind of v := v statement for every for loop variable declared with :=. It would fix this loop and many others to do what the author clearly intends.

A subtler variation is when the code says testCases := []int{1, 2, 4, 6} (note the non-even test case 1):

func TestAllEvenBuggy(t *testing.T) {
    testCases := []int{1, 2, 4, 6}
    for _, v := range testCases {
        t.Run("sub", func(t *testing.T) {
            t.Parallel()
            if v&1 != 0 {
                t.Fatal("odd v", v)
            }
        })
    }
}

TestAllEvenBuggy passes today, because the test is only checking 6, four times. Changing the loop semantics would still fix the loop to do what the author clearly intended, but it would break the test, because the test is passing incorrectly. So there is the potential to cause problems for users.

Because of this potential for problems, the new loop semantics would only apply in Go modules that have opted in to the release with the new loops. If that was Go 1.22, then only packages in a module with a go.mod that says go 1.22 would get the new loop semantics. Packages in other modules, including packages in dependencies, would get the old semantics. This would guarantee that all existing Go code keeps working the same as it does today, even when compiling with a new toolchain. People only need to debug loop changes when they opt in to the new semantics in go.mod. This approach is in keeping with our backwards and forwards compatibility work, #56986 and #57001, specifically the principle that toolchain upgrades preserve the behavior of old code, and compatibility is based on the go line.

If this proposal is accepted, users will need additional tooling support for a successful transition. That would come primarily in two forms: a compiler mode that reports affected loops, and a debugging tool that identifies the specific loops whose changed compilation is responsible for causing a test failure. There is a demo of the tooling support at the end of this comment.

The tooling support is important and necessary, but we expect it to be needed only rarely. Analysis and conversion of Google's own Go source code found that only about 1 package test per 8,000 started failing due to the new semantics, and the bug was essentially always in the test itself, like in TestAllEvenBuggy. In contrast, the new loopclosure vet analysis that shipped in Go 1.20 flagged definite bugs in 1 test per 400. The rest stayed passing with their bugs fixed by the new semantics. That is, in Google's code base, about 5% of the tests that contain this kind of sharing mistake were like TestAllEvenBuggy, exposed as buggy by the new semantics. The other 95% of the tests with this kind of mistake still passed when they started testing what they intended to. Of all the test failures, only one was caused by a loop variable semantic test change in non-test code. That code was very low-level and could not tolerate a new allocation in the loop. The design document has details. This evidence suggests that changing the semantics is usually a no-op, and when it’s not, it fixes buggy code far more often than it breaks correct code.

Based on the preliminary discussion the further work summarized here, @dr2chase and I propose that we make the change in an appropriate future Go version, perhaps Go 1.22 if the stars align, and otherwise a later version.

More details can be found in the design document.

Update, May 10 2023: Note that the change will not break the common idiom of changing a loop variable in the body of a 3-clause for loops. See this comment for details and links before commenting about 3-clause for loops. Thanks.

Tooling Demonstration

To demonstrate the tooling we would provide to support a successful transition, here is a complete example of a test that passes today but fails with the new loop semantics:

% cat x_test.go
package x

import "testing"

func Test(t *testing.T) {
    testCases := []int{1, 2, 4, 6}
    for _, v := range testCases {
        t.Run("sub", func(t *testing.T) {
            t.Parallel()
            if v&1 != 0 {
                t.Fatal("odd v", v)
            }
        })
    }
    for _, v := range testCases {
        t.Run("sub", func(t *testing.T) {
            t.Log(v)
        })
    }
    for _, v := range testCases {
        t.Run("sub", func(t *testing.T) {
            t.Parallel()
            if v&1 != 0 {
                t.Fatal("odd v", v)
            }
        })
    }
}
%

Building the program with -d=loopvar=2 reports all the affected loops:

% GOEXPERIMENT=loopvar go test -gcflags=all=-d=loopvar=2 x_test.go
# runtime
go/src/runtime/proc.go:1815:7: loop variable freem now per-iteration, stack-allocated
go/src/runtime/proc.go:3149:7: loop variable enum now per-iteration, stack-allocated
go/src/runtime/mgcmark.go:810:6: loop variable d now per-iteration, stack-allocated
go/src/runtime/traceback.go:623:7: loop variable iu now per-iteration, stack-allocated
go/src/runtime/traceback.go:943:7: loop variable iu now per-iteration, stack-allocated
# runtime/pprof
go/src/runtime/pprof/pprof.go:386:9: loop variable r now per-iteration, heap-allocated
go/src/runtime/pprof/proto.go:363:6: loop variable e now per-iteration, stack-allocated
go/src/runtime/pprof/proto.go:612:9: loop variable frame now per-iteration, heap-allocated
go/src/runtime/pprof/protomem.go:29:9: loop variable r now per-iteration, heap-allocated
# command-line-arguments [command-line-arguments.test]
./x_test.go:7:9: loop variable v now per-iteration, stack-allocated
./x_test.go:15:9: loop variable v now per-iteration, stack-allocated
./x_test.go:20:9: loop variable v now per-iteration, stack-allocated
# internal/fuzz
go/src/internal/fuzz/fuzz.go:432:9: loop variable e now per-iteration, stack-allocated
--- FAIL: Test (0.00s)
    --- FAIL: Test/sub (0.00s)
        x_test.go:11: odd v 1
    --- FAIL: Test/sub#08 (0.00s)
        x_test.go:24: odd v 1
FAIL
FAIL    command-line-arguments  0.199s
FAIL
%

Note that some loops are in the Go standard library. Those are unlikely to be the cause, so we can limit the diagnostics to the current package by dropping all=:

% GOEXPERIMENT=loopvar go test -gcflags=-d=loopvar=2 x_test.go
# command-line-arguments [command-line-arguments.test]
./x_test.go:7:9: loop variable v now per-iteration, stack-allocated
./x_test.go:15:9: loop variable v now per-iteration, stack-allocated
./x_test.go:20:9: loop variable v now per-iteration, stack-allocated
--- FAIL: Test (0.00s)
    --- FAIL: Test/sub (0.00s)
        x_test.go:11: odd v 1
    --- FAIL: Test/sub#08 (0.00s)
        x_test.go:24: odd v 1
FAIL
FAIL    command-line-arguments  0.119s
FAIL
%

That trims the diagnostic output, but it still leaves us to check all our loops. In this trivial example, 2 of 3 are buggy, but in a real program there are fewer needles and more haystack. To solve that problem, we can use a dynamic tool that reruns a test, varying which loops compile with the new semantics, to identify the specific loops that cause the failure:

% bisect -loopvar go test x_test.go
bisect: checking target with all changes disabled
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=n x_test.go... ok (13 matches)
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=n x_test.go... ok (13 matches)
bisect: checking target with all changes enabled
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=y x_test.go... FAIL (13 matches)
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=y x_test.go... FAIL (13 matches)
bisect: target succeeds with no changes, fails with all changes
bisect: searching for minimal set of changes to enable to cause failure
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=+0 x_test.go... ok (7 matches)
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=+0 x_test.go... ok (7 matches)
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=+1 x_test.go... FAIL (6 matches)
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=+1 x_test.go... FAIL (6 matches)
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=+01 x_test.go... FAIL (3 matches)
bisect: run: go test -gcflags=all=-d=loopvar=1,loopvarhash=+01 x_test.go... FAIL (3 matches)
...
bisect: FOUND failing change set
--- change set #1 (enabling changes causes failure)
./x_test.go:7:9
---
...
bisect: FOUND failing change set
--- change set #2 (enabling changes causes failure)
./x_test.go:20:9
---
...
bisect: target succeeds with all remaining changes enabled
%

Now we know to spend our attention on the loops at lines 7 and 20, which are in fact the buggy ones. (The loop at line 15 has been cleared of suspicion.) Bisect uses an adaptation of binary search that can identify the relevant loop in a relatively small number of trials, making it useful even on very large, very complicated tests that run for a long time.

markusheukelom commented 1 year ago

@Merovius

Even if that was true (I'm always skeptical of talking about "the real cause" of a misunderstanding), this particular example is only one example of many. And most of those happen around closures, which definitely do not make this assumption.

My argument was only in relation to the given (class of) examples, sorry if that was unclear. I agree that it is one example out of possibly many other classes of examples that the proposal intents to fix. I agree with your other remarks; I just tried to explain my argument, and it is not such a strong argument against this proposal per se, I agree.

If the language was newly designed, I would agree this proposal maybe be a better way to do the range for.

I am also not against this proposal, I just hope this is not done out of "academic" interests or the urge to polish things.

Btw, looking again at the original example of the proposal, I now do see how this proposal might actually make sense in relation to that specific example. Maybe the proposal appears to me to be an "academic fix", because the first (and only) example involves parallelism. And also due to my lack of imagination of where being able to remove the "i := i" is really "wow". I almost never need i := i myself; and there's enough alternatives. Maybe the authors have more context to the proposal than is expressed?

Merovius commented 1 year ago

Maybe the proposal appears to me to be an "academic fix"

To be clear, the feedback on this proposal has been overwhelmingly positive, both in terms of emoji-votes (on this issue, the pre-proposal discussion) and in terms of feedback on social media and other discussion forums. The overwhelming majority of Go programmers seem to consider this a real fix for a real problem and anticipate it eagerly as one of the major footguns in Go getting fixed. Of course, that's not "data", but suffice it to say, there is nothing "academic" about this.

zigo101 commented 1 year ago

Is 40:18 defined as overwhelmingly? Not to mention that some of the the breaking cases shown in this thread are not mentioned in the proposal at all.

Merovius commented 1 year ago

@go101 No, but 369:6 is. Also, your examples are not mentioned in the proposal text, because they are not based on real code, but are purely synthetic and you have been told that repeatedly and are fully aware. Refusing to acknowledge that they have been seen and addressed and continuing to concern-troll this issue is not driving any sympathies to your cause. Please just stop it, already.

rsc commented 1 year ago

Many thanks to @Merovius and @thepudds for engaging here and helping answer these questions.

To @markusheukelom's comments:

I am programming full-time in Go for the last 5 years, and I have only once or twice run into a bug caused by this (which was really quickly solved) in all these 5 years.

I just hope this is not done out of "academic" interests or the urge to polish things.

Let me state very clearly that this is not about "polishing". It is a major problem that Go programmers hit frequently at scale, and with real consequences. I'm not at liberty to share examples from inside Google, but here is one public example from Let's Encrypt. The code in question said:

// authz2ModelMapToPB converts a mapping of domain name to authz2Models into a
// protobuf authorizations map
func authz2ModelMapToPB(m map[string]authz2Model) (*sapb.Authorizations, error) {
    resp := &sapb.Authorizations{}
    for k, v := range m {
        // Make a copy of k because it will be reassigned with each loop.
        kCopy := k
        authzPB, err := modelToAuthzPB(&v)
        if err != nil {
            return nil, err
        }
        resp.Authz = append(resp.Authz, &sapb.Authorizations_MapElement{Domain: &kCopy, Authz: authzPB})
    }
    return resp, nil
}

Note the kCopy := k guarding against the &kCopy used at the end of the loop body. Unfortunately, it turns out that modelToAuthzPB kept a pointer to a couple fields in v, which is impossible to know when reading this loop.

The initial impact of this bug was that Let's Encrypt needed to revoke over 3 million improperly-issued certificates. They ended up not doing that because of the negative impact it would have had on internet security, instead arguing for an exception, but that gives you a sense of the kind of impact.

The code in question was carefully reviewed when written, and the author was clearly aware of the potential problem, since they wrote kCopy := k, and yet it still had a major bug, one that is not visible unless you also know exactly what modelToAuthzPB does.

I have personally made this mistake easily more than once a year for the past ten years. Of course, I usually find it very quickly with basic testing within minutes of writing the code, but not always. It is a real problem, worth correcting.

thepudds commented 1 year ago

Is 40:18 defined as overwhelmingly?

Hi @go101, your comment you just referenced, which was the ~3rd overall comment here, did get 18 upvotes, but it also got 44 downvotes, which I think makes it the comment here with the largest number of downvotes.

Later, the ~15th comment was this long comment from Russ that talked about the pros (and some cons) of changing the 3-clause for loops and attempted to explain why it is worth the cost based on the evidence.

Russ' comment, which again argued in favor of changing 3-clause for loops, received 35 upvotes and no downvotes (so divide by zero error).

You also said:

Not to mention that some of the the breaking cases shown in this thread are not mentioned in the proposal at all.

That comment from Russ also started with a link to what I think might be your three top synthetic examples:

I freely admit that it is possible to write 3-clause for loops that behave differently in the new semantics, just as it is possible to write range loops that behave differently, such as this one by Tim King [...]

While your examples might have not been in the text of the proposal itself, I think it's fair to say your examples have received significant exposure here.

I think the core Go team is wise to say that emoji voting can only be taken as a "rough measure". In general, emoji voting can be noisy, especially early on in a discussion when there can be a haze of confusion as people brand new to a nuanced topic start to ramp up on it.

In this particular case, I think it's also fair to say that initially there was a decent amount of confusion over what exactly was changing about the 3-clause for loops, and which common idioms would break for 3-clause for loops, and that triggered a bunch of clarification in the discussion as well as this update to the intro comment for the proposal:

Update, May 10 2023: Note that the change will not break the common idiom of changing a loop variable in the body of a 3-clause for loops. See this comment for details and links before commenting about 3-clause for loops. Thanks.

In other words, personally I don’t know how much weight I’d put on seeing 18 upvotes on your comment when there was demonstrably significant confusion about the topic at that point.

In any event, I think making the loopvar GOEXPERIMENT available to all gophers around the world with Go 1.21 will help illuminate what real-world code is impacted by this proposal. If evidence arises that more real-world code is impacted than currently anticipated by the proposal authors, I think that will be taken into account.

rsc commented 1 year ago

More real-world evidence.

Earlier today I took the Kubernetes current dev branch and ran FORCE_HOST_GO=y make test with the current Go 1.21 dev branch (without the loopvar experiment). The Kubernetes repo has 1054 package tests. Of those, 3 failed. Two are related to testing for non-guaranteed undocumented behavior around functions and inlining, and more aggressive inlining broke the tests (but not the code being tested). One is related to a TLS error message becoming more specific. The three issues I filed are:

60324

(I decided we should preserve the undocumented behavior here if possible)
kubernetes/kubernetes#118149
kubernetes/kubernetes#118153

Having patched these locally I had a working make test with Go 1.21, so then I enabled loopvar with GOEXPERIMENT=loopvar FORCE_HOST_GO=y make test. That produced two failures. Using bisect worked perfectly to identify the specific loop (transcript below). I filed these two bugs:

kubernetes/kubernetes#118151
kubernetes/kubernetes#118152

Both of these are loopvar bugs in tests covering up other bugs in the tests. Having fixed the loopvars, the other latent test bugs are unmasked.

In Google's source tree the rate of package failure due to loopvar bugs was 1 in 8,000. In Kubernetes it is more like 1 in 500. Still low, still easy to deal with, and comparable to the usual number of tests that must be cleaned up for a new Go release. Debugging the loopvar failures was easier than debugging the non-loopvar failures, because bisect did all the work.

While writing this report I checked out https://github.com/ethereum/go-ethereum and ran go test ./... and GOEXPERIMENT=loopvar go test ./.... All tests pass, both ways. That's 0/113 package tests broken the by loopvar change.

This adds to the pile of evidence that the changes do not break real-world code but do fix actual bugs. I encourage others to run GOEXPERIMENT=loopvar go test in other projects you are concerned about.

bisect transcript

``` $ bisect -compile=loopvar FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo bisect: checking target with all changes disabled bisect: run: GOCOMPILEDEBUG=loopvarhash=n FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (952 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=n FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (952 matches) bisect: checking target with all changes enabled bisect: run: GOCOMPILEDEBUG=loopvarhash=y FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (952 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=y FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (952 matches) bisect: target succeeds with no changes, fails with all changes bisect: searching for minimal set of enabled changes causing failure bisect: run: GOCOMPILEDEBUG=loopvarhash=+0 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (457 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+0 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (457 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+00 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (231 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+00 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (231 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (116 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (116 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (115 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (115 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+0100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (62 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+0100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (62 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+00100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (30 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+00100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (30 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (16 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (16 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (14 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (14 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+0100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (7 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+0100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (7 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+00100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (6 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+00100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (6 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (3 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (3 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+0000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (2 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+0000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (2 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (1 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=+000000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (1 matches) bisect: confirming failing change set bisect: run: GOCOMPILEDEBUG=loopvarhash=v+000000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (1 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=v+000000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... FAIL (1 matches) bisect: FOUND failing change set --- change set #1 (enabling changes causes failure) pkg/volume/git_repo/git_repo_test.go:392:9 --- bisect: checking for more failures bisect: run: GOCOMPILEDEBUG=loopvarhash=-000000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (951 matches) bisect: run: GOCOMPILEDEBUG=loopvarhash=-000000100100 FORCE_HOST_GO=y make test WHAT=./pkg/volume/git_repo... ok (951 matches) bisect: target succeeds with all remaining changes enabled $ ```

thediveo commented 1 year ago

Is 40:18 defined as overwhelmingly? Not to mention that some of the the breaking cases shown in this thread are not mentioned in the proposal at all.

Can we please scale down on discussing the semantics of natural languages? For fun, in a proportional democracy two-thirds in favor passes the high bars easily. As the German proverb goes, don't put words on the Gold scale.

I really appreciate looking at things both from the daily work perspective as well as a more theoretical one and I've shelled out the money for your well written ebooks. But even after trying to grok your counter examples I still fail to see how these are relevant? Others have shown examples from the existing corpus of PROD code, such as k8s, and that really helps with understanding the whole thing. I'm looking forward to your next ebook in the hope that while it explores the intricate corners it doesn't get stuck in ex cathedra.

markusheukelom commented 1 year ago

@rsc Thank you for the real-world evidence and examples. I am now convinced loop variables are a real-world issue.

However, if I pass go test -vet=all on the example from this proposal it simple produces: "loop variable v captured by func literal". (As per the loopclosure check (see e.g. https://pkg.go.dev/cmd/vet)).

Why is this not a sufficient remediation already? For example by recommending this option more often, or even adding loopclosure to the "high confidence" set of check performed by go vet? (That would break peoples tests suddenly potentially so should also not be taken lightly, but still better imo than changing the language, at least worth considering).

Besides (elevating) the loopclosure vet check, maybe an additional check "loopptr" can be created that warns if a pointer to a loop variable is taken. I have a hunch that taking the address of a loop variables predicates many bugs as well (see my other comments). Such a check would have caught the "Let's Encrypt" bug quite possibly. Moreover, it will also catch bugs that will also be present if this proposal is implemented (see my previous comment). Note: I have only thought about this is superficially and maybe loopptr will give annoying false positives in some case, even though I cannot really think of one directly.

Both this proposal and go vet stuff can be done, of course. I just wonder if the go vet path was explored? It is not mentioned in the proposal, so I though I'd mention it here. I am not against this proposal, just consider this comment under the chess adagium "if you find a good move, look for a better one."

EDIT: this is discussed as part of the proposal: https://go.googlesource.com/proposal/+/master/design/60078-loopvar.md#static-analysis-is-not-a-viable-alternative

EDIT: I've run all examples from the proposal (two total) through go test -vet=all, and all are picked-up by vet checks. So wrt to these prototypical examples, and even though go vet is mentioned to be sloppy (both ways) and therefore mentioned to be not adequate, it seems it is adequate on the provided examples in the proposal.

AndrewHarrisSPU commented 1 year ago

Why is this not a sufficient remediation already?

https://go.googlesource.com/proposal/+/master/design/60078-loopvar.md#static-analysis-is-not-a-viable-alternative is sharp on this point

Merovius commented 1 year ago

There's also (to repeat) the issue that even if you are aware of the issue, the mitigation is to add a x := x, which is confusing and superfluous and bad.

markusheukelom commented 1 year ago

Am I alone in finding it a bit sad that this proposal makes the post statement no longer a post-statement but some kind of post-init statement, but only on the second loop?

It becomes harder to remember how it exactly works "under the hood". The problem with it is that (to me, at least) it is no longer obvious I can write the following:

for i := 0; i < 100;  {

    n := 2 // assume n is computed in the loop, for example
    i += n
}

This will still work with the new behaviour, but that is no longer obvious from the code: you have to look up the specs to understand that there is an implicit outer_i that is also assigned back to i at the end of the loop. I find this to be a bit anti-Go-ish, as it otherwise states it is better to explicit (i.e. sort of preferring the i := i where needed instead).

I find it a bit hard to mention this, on the one hand I completely agree that the proposal naturally fixes some bugs, but on the other hand I just dislike a language change, especially involving the for keyword that works just fine and is nice and simple with how the post-statement works.

thediveo commented 1 year ago

Reading this example without any prior knowledge of this discussion, I immediately see and understand @markusheukelom example without any overthinking. Maybe because I'm ignorant of post-statements and the like. I remember not least rsc(?) saying that most Gophers should be able to use Go correctly without reading the specification.

This discussion seems to be needlessly stuck in a loop, and it's not due to post statements, and other things. Please break it so we can finally continue improving error handling and other pain points.

Merovius commented 1 year ago

@markusheukelom We've been through that above. It's only complicated if you make it complicated. Unless you involve pointers to or closures over loop variables, everything will work exactly as before. If you do, the new behavior will likely match your expectations better than the current one. It is only non-obvious what would happen in that loop, if you believe it to be non-obvious.

And again, range loops already have the same kind of implicit iteration variable. People manage fine.

I find it a bit hard to mention this, on the one hand I completely agree that the proposal naturally fixes some bugs, but on the other hand I just dislike a language change […]

That is really my impression of this discussion. All the data and all the logic seems to support making this change. But some people seem to be insistent on arguing against it, simply based on the fact that it is a change.

My impression is that this is also what's going on with the supposed non-obviousness of your loop example. You say it's non-obvious, because you committed to something about loop semantics changing, thus thinking there must be something there. When in reality, the semantics don't actually change.

The only cases where you need to worry are the cases where you need to worry today - when you are involving pointers or closures.

Apologies if I'm reading too much into the responses here, but you bringing that up this way made it feel appropriate to put this suspicion into words.

markusheukelom commented 1 year ago

@Merovius Yes, I think you may be right there with your observations. I'll leave it there. Thanks for the patience.

zigo101 commented 1 year ago

Okay, my last comment in this thread.

@Merovius

No, but 369:6 is.

The first comment of this thread almost doesn't mention the changes on 3-clause for-loops. Whether it is deliberated, the fact indeed affected the voting result IMHO.

Also, your examples are not mentioned in the proposal text, because they are not based on real code, but are purely synthetic and you have been told that repeatedly and are fully aware. Refusing to acknowledge that they have been seen and addressed and continuing to concern-troll this issue is not driving any sympathies to your cause. Please just stop it, already.

Do you have evidences for this? Have you checked all Go code in the world? At least Go programmers have the right to know the effects.

@thepudds

Firstly, thanks for buying my books. :)

... did get 18 upvotes ...

Considering that the proposal is written in an incomplete way, that is enough to show that many Go programmers don't think the change in 3-clause for-loops is not a good idea.

But even after trying to grok your counter examples I still fail to see how these are relevant?

If backward-compatibility and expectation breaking is not relevant, then okay, it is not relevant.

Others have shown examples from the existing corpus of PROD code, such as k8s, and that really helps with understanding the whole thing.

I see most bugs are in for-range loops, but rarely in 3-clause for-loops.

Any way, I will not make debates in this thread any more. Some viewpoints in this threads actually astonished me so much that I know any more debate is a vain. Maybe I'm too old to keep up with the pace of Go's evolution.

As an old Go programmer and being aware of the changes, the changes will be not a big problem for me. For me, the Impacts of the changes:

the flexibility of 3-clause for-loops are reduced some. Now we have choice to decide which iteration vars are per-step scoped and which are whole-loop scoped. The change forces us to handle this in another way. I'm not sure whether or not this is a good thing for new Go programmers and old Go programmers who are not aware of the changes.
don't view Go as C-family language any more.
the changes will produce more materials for my books. :)

I really don't need to care much about on the changes (on 3-clause loops). It is not my language. Here, I, just as a normal user, posted some of my opinions and showed some bad effects of the changes.

Re-list my opinions here:

3-clause for-loop is not iteration specific. The uses of 3-clause for-loop have many variations. This is a good feature which contributes to Go's flexibility. Adding implicitness to it is not a good idea, for implicitness often causes surprises. The benefits of the change on 3-clause for-loops are much smaller that the drawbacks.
Using semvars to control language semantic changes is not a good idea, especially for changes on 3-clause for-loops. It is weird that the same piece of code behaves differently just for it locates in two different Go source files.

Hope Go go well. :)

black23eep commented 1 year ago

I think it adds unnecessary complexity and undermines the promise of compatibility

davecb commented 1 year ago

I like the idea of controlled change (it used to be my team's task at Sun) and agree it should be tied to compiler version.

gopherbot commented 1 year ago

Change https://go.dev/cl/500958 mentions this issue: doc/go1.21: document GOEXPERIMENT=loopvar and invite feedback

aarongable commented 1 year ago

Thanks for running this experiment! We at Let's Encrypt are very excited for this language change, not least because we are perhaps the most public example of loop variable scoping leading to serious bugs.

We've enabled GOEXPERIMENT=loopvar in our CI tests and it hasn't broken anything or revealed any bugs. Thanks again!

rsc commented 1 year ago

Overall the response here has been overwhelmingly positive, and we now have two examples (Google's codebase and Let's Encrypt's) of the "big hammer turn it on all the time" not causing problems. That suggests the smaller, gradual go.mod-version-based hammer will be even less disruptive.

With the tree opening for Go 1.22 soon, we want to move this proposal along so we can land the changes as early in the cycle as possible. So this will be marked likely accept shortly, and probably accepted next week. That said, if any showstopping kinds of issues or new data come in, we still want to hear them. Thanks for all the discussion so far.

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 1 year ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

ryanschneider commented 1 year ago

I scanned the comments so might have missed this, but wanted to call this out: my one concern is that past Go version changes have (always?) failed in obvious ways when compiled with older Go versions, whereas as this change is semantic only so doesn't require a syntax change. For example, generics clearly don't compile on older Go, nor do new stdlib packages or methods.

For example, code using the "new loop semantic" might leak into older projects via copy/paste (e.g. forum answers), and won't obviously break.

Anyways, I guess my main comment is that either this is the first time there's been a semantic only change, in which case I think there should be additional considerations about what that means for Go 's evolution moving forward, or there have been semantic-only changes in the past that I'm unaware of, in which any lessons learned from those should be considered here as well.

edit: As Merovious pointed out below, this is addressed in the full design doc, sorry I missed that. Quoting it here for others:

In the Go 2 transitions document we gave the general rule that language redefinitions like what we just described are not permitted, giving this very proposal as an example of something that violates the general rule. We still believe that that is the right general rule, but we have come to also believe that the for loop variable case is strong enough to motivate a one-time exception to that rule. Loop variables being per-loop instead of per-iteration is the only design decision we know of in Go that makes programs incorrect more often than it makes them correct. Since it is the only such design decision, we do not see any plausible candidates for additional exceptions.

Merovius commented 1 year ago

@ryanschneider You are correct that this is the first such change. And as the design document mentions, it is intended to be the last.

rsc commented 1 year ago

For anyone new to this issue and exploring what the change would mean, the Go playground now lets you experiment with the new semantics. To do that, use Go 1.21 and add // GOEXPERIMENT=loopvar at the top of your program. For example try https://go.dev/play/p/lDFLrPOcdz3 and then try deleting the comment.

gopherbot commented 1 year ago

Change https://go.dev/cl/518675 mentions this issue: [release-branch.go1.20] cmd/go: refuse to build Go 1.22 code

gopherbot commented 1 year ago

Change https://go.dev/cl/518815 mentions this issue: [release-branch.go1.19] cmd/go: refuse to build Go 1.22 code

iBug commented 1 year ago

I must say @go101's argument makes total sense. I have the following information to defend for 3-clause for loops (hereinafter "for;; loops").

Similarity in other languages is an important source for intuition. To the best of my knowledge, the only other languages that share similarity with this proposal (if applied to for;; loops) is JavaScript ES6 for-loops-with-let-declaration. Most compared languages, like C++ and Rust, both have explicit distinction between values and references, which makes capturing loopvars unambiguous. Go, however, has no concept of "reference", which is commonly replaced by pointers. This is concrete, objective evidence on "intuition".

Additional points about JavaScript:
- It has no concept of either "pointers" or "references", making it an inappropriate comparison with Go.
- It maintains the behavior of old code: As long as you don't add let to your loops, your code behaves consistently, be it in ES3 or ES2023.
On the contrary, other languages with for-range or for-each loop all behave similarly, in that the loopvar in different iterations refer to distinct objects, excluding languages where values and references are explicit concepts (like C++).
Using for-range loops in all examples in the initial comment, mentioning nothing about for;; loops, and then adding that for;; loops are affected as well, is a fundamentally flawed process to push a proposal forward. That the original proposal gained popularity as it was written does not mean changing for;; loops would be equally received. It is therefore important to split this proposal into two, the other regarding for;; loops, and gather votes and opinions separately.

domdom82 commented 1 year ago

It's funny. I got totally caught by surprise with this issue. Thankfully, I never actually ran into it but my assumption has always been that loop variables are closure-safe and retain their value on each iteration. IMHO if your program breaks by this change, it was buggy to begin with. So by all means, do go ahead @rsc

gopherbot commented 11 months ago

Change https://go.dev/cl/532580 mentions this issue: go/ssa: new range var semantics

entonio commented 11 months ago

As someone whose codebase contains lots and lots of for-range loops that cannot all be individually unit-tested, my concern isn't with 'contrived' code. Contrived examples are there so that effects can de demonstrated; in real life, it's usually abstraction and the escaping of variables into downstream code that produce the same problems, and are much harder to detect. And they're types of domain interaction that aren't as likely to exist within the base sdk, nor within the scope of a very focused codebase such as Encrypt's. The risk of breakage doesn't seem especially high, it's the risk of unnoticed breakage. And it's not just a matter of unit tests, just think of all the sdk unit tests that you say were shown to be wrong when adopting the change.

At least as for-range loops go, it seems it would have been trivial to just keep the old code running as is by tying the new behavior to a new keyword, e.g. for k, v := loop m { ... }. Or even loop k, v := range m { ... }, loop i := 0; ; i++ { ... }. Or make the new keyword use the old behaviour, that way wary folk could replace all their for instances and be calm. Or create a new assignment operator for this case, since this seems to be about the semantics of assignment.

Otoh, I fully understand not wanting to multiply operators or keywords, lest we end up in C++ turmoil.

One thing I think could help would be -as someone suggested- a migration tool that would not only show which loops would be affected by the change, but also provide a rewriting that would keep the old behaviour (by declaring vars outside the for, is that it? how does that work with for-range?). For me, that would solve the problem entirely, or almost. The last time I used the flag, I don't recall seeing anything towards that.

zephyrtronium commented 11 months ago

@entonio Adding a new keyword like loop is never an option in Go because any code using loop as a variable name would cease to compile, violating the Go 1 compatibility promise. Beyond that, the design document linked from the initial post explains why changing loop syntax or creating a migration tool are poor choices.

thediveo commented 11 months ago

@entonio Last time I looked at it, the k8s code base consisted of huge quantities of for range loops. They aren't individually unit-tested either. Yet IIRC so far running the exisiting tests revealed no detectable problem. If you have followed the discussion then you might have noticed mentioning of code scans for misuse of for range variables. The following discussion should also have sufficiently answered your own argument: only contrived examples were presented, but no real code. I really don't get your writing about abstraction and skds here. To me, this doesn't bring any new reasoning here. In your logic, even the slightest change in code generation anywhere else must be dangerous, with all those abstractions and sdks?

timothy-king commented 11 months ago

One thing I think could help would be -as someone suggested- a migration tool that would not only show which loops would be affected by the change

@entonio You may wish to read the section Transitional Support Tooling in the design doc. This describes how to get the data one needs from the compiler to create such a tool. It would be a decent amount of work, but that is the starting point.

dgoodine commented 10 months ago

Not weighing in one way or another on this specific proposal, but I have an observation that may be useful.

IIRC, most implementations of closures in Lisp implementations treated values of outer lexically scoped variables and references to them differently.

That is to say, referencing a loop variable by value stores its current value in the closure environment when the closure is created. Referencing a loop variable by reference (which is what the current implementation seems to be doing in all cases) is storing a pointer to the variable in the closure environment.

Clearly, doing the latter in a loop would make this a race-condition factory, but in cases where the closure only needs the values of an outer lexical context, it would avoid this issue completely.

I don't know anything about how they are implemented in Go, nor whether such an approach is relevant or feasible. Perhaps if someone here has some insight or knows someone on the go team with more information, it might be worth considering figuring out if that might be a better and more general solution. The compiler should be able to handle the distinction between the two in the generated code, without wondering if it would break things.

(Edit: It also is completely insensitive to the looping context (e.g. for/range).)

seankhliao commented 7 months ago

Now that Go 1.22 is released, I think we can close this as done.

golang / go

spec: less error-prone loop variable scoping #60078

Tooling Demonstration

60324