proposal: spec: redefine range loop variables in each iteration

bisgardo commented 7 years ago

As an alternative to #20725 (which was originally about go vet), let the variables of range loops be implicitly redefined in each iteration like in Dart's for loops. That is,

for k, v := range vals {
  // ...
}

should be equivalent to

for k, v := range vals {
  k := k
  v := v
  // ...
}

This will make it "safe" to take the loop variable's addresses as well as capturing the loop variables in nested functions (see #16520).

The proposal could be expanded to vanilla for loops, although that would make it diverge semantically from other languages.

bradfitz commented 7 years ago

We considered this prior to Go 1, but considered it too quickly. We'll definitely reconsider it for Go 2. I actually think there's a duplicate tracking bug for this.

davecheney commented 7 years ago

I think this would be an important improvement to a situation that catches every go developer at least once in their efforts to learn Go.

On Tue, 20 Jun 2017, 06:10 Brad Fitzpatrick notifications@github.com wrote:

We considered this prior to Go 1, but considered it too quickly. We'll definitely reconsider it for Go 2. I actually think there's a duplicate tracking bug for this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/20733#issuecomment-309559617, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAcA80GqcY1F4zKSo-optHwPwFOb1Jpks5sFtXGgaJpZM4N-uiT .

dsnet commented 7 years ago

And I would argue even catches seasoned Go programmers occasionally. See https://golang.org/cl/40937. It took me a while staring at that logic trying to figure out why the code wasn't doing what it naturally read as.

cznic commented 7 years ago

Adopting this proposal can make performance of perfectly valid code worse, sometimes dramatically. For example, when escape analysis cannot determine it's safe to keep the {k,v} value on stack while it's address is taken. Then we get O(n) heap allocations instead of the current O(1).

davecheney commented 7 years ago

@cznic can you give some examples of this. I would expect that most of these cases would actually be bugs because taking the address of the same variable over and over again was the problem in the first place.

cznic commented 7 years ago

@davecheney Consider https://github.com/golang/go/issues/20725#issuecomment-309476351 and imagine escape analysis is not able to prove foo only reads via the passed pointer.

davecheney commented 7 years ago

@cznic

Why would you want to take the address of a copy of the current range value? If i'm not mistaken, any modification to it would be lost.

cznic commented 7 years ago

@davecheney

Why would you want to take the address of a copy of the current range value?

To avoid copying a big value when passing it as an argument.
To call a pointer receiver method on the value.

If i'm not mistaken, any modification to it would be lost.

Taking the address of a value does not imply intent to modify the pointee.

davecheney commented 7 years ago

To avoid copying a big value when passing it as an argument.

But the value is already copied as part of the range iteration. Having a separate value escape on every iteration is sub optimal, but hopefully you'll agree it's reasonable to trade a potential performance problem for a very common correctness problem.

To call a pointer receiver method on the value.

True, that is asking a lot of escape analysis and I really don't know the ways that escape analysis and the implicit taking of address interact. But I will note that your example explicitly took the address of the copy of the current index value, so if the intention was to pass a pointer to the value down to a function, you'd want to make sure you were passing a pointer to the original slice element, not a copy. In the code that I have read that is more commonly written as

for i := range vals {
     foo(&vals[i])
}

Taking the address of a value does not imply intent to modify the pointee.

Respectfully, generally it does, as in your method example above.

cznic commented 7 years ago

foo(&vals[i])

Iff vals is a slice/array. There are also maps and channels.

Respectfully, generally it does, as in your method example above.

Even the innocent looking fmt.Println(value) always takes the address of 'value'. As does every function taking an interface (not only interface{}, any interface) argument. Every assignment to an interface variable takes the address of the RHS (if not already a pointer), etc.

Edit: Added '(if not already a pointer)'.

davecheney commented 7 years ago

Iff vals is a slice/array. There are also maps and channels.

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received. Again i'm falling back on arguments of correctness over performance.

Even the innocent looking fmt.Println(value) always takes the address of 'value'.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

cznic commented 7 years ago

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received.

Right. But with this proposal there will be, in certain situations, an additional allocation per iteration.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

You're right. The interface runtime struct has the address, but that's not different with this proposal. I misspoke, sorry.

bcmills commented 7 years ago

@cznic

with this proposal there will be, in certain situations, an additional allocation per iteration.

Sure, but if it actually shows up on a profile there should be an easy workaround:

var bigValue someType
for _, v := range bigValues {
        bigValue = v  // Compiler should be smart enough to elide this copy.
        foo(&bigValue)  // Only one copy of bigValue escapes.
}

It is possible to write correct, efficient code with either definition of range loops; the question is mainly which should be the "default" or "typical" case. I believe that, in real usage, the proposed change would fix far more issues — and remove more workarounds — than it would introduce.

cznic commented 7 years ago

@bcmills

It is possible to write correct, efficient code with either definition of range loops;

True, but still the proposal has the potential to cripple existing correct and efficient code. Also, your comment does not consider v.foo(). We've agreed earlier with @davecheney that for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack. And, again, "Then we get O(n) heap allocations instead of the current O(1)."

bcmills commented 7 years ago

@cznic

still the proposal has the potential to cripple existing correct and efficient code.

For this proposal (and Go 2 proposals in general), if the proposal is accepted we should either compile existing Go 1 programs as-is or provide a tool that converts them preserving existing semantics. That is: no "existing" Go 1 code should be affected either way.

for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack

Escape analysis of pointer methods is mostly straightforward; you only really need to worry about arguments to interface methods. And note that iterating over a container of n values is already O(n), so doing O(n) memory-management work changes only the constant factors (not the asymptotic behavior).

cznic commented 7 years ago

@bcmills

That is: no "existing" Go 1 code should be affected either way.

I don't think this can be achieved without solving the halting problem. But let's assume it can be done. Even then we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language. That can be done, no doubt. Isn't it better to avoid such leaks of abstractions? I think so.

bcmills commented 7 years ago

@cznic

we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language.

Why? Generally, the only time an optimization needs to go into a language spec is if it affects the asymptotic behavior of the program, the canonical example being tail-call-elimination in functional languages (which changes memory usage from O(1) to O(N) with the depth of recursion). As I noted above, per-iteration memory-management overhead affects only constant factors: running time for loops is already O(N) with the number of iterations, and peak memory usage is O(1) either way (because the value can be garbage-collected at each iteration).

bisgardo commented 7 years ago

@cznic

the proposal has the potential to cripple existing correct and efficient code

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture. I find it quite hard to imaging cases where this would be a requirement for correctness (would appreciate examples). But even if it is, it's easy to make a tool (see #20725; original formulation) that would flag all relevant code for scrutinization as part of a migration process.

As to the performance argument, I believe that if a good escape analysis can't figure out where a pointer is going, then neither can programmers (or it does in fact escape the loop). Thus, if such code is correct, it's probably by accident and not for performance reasons.

martisch commented 7 years ago

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture.

I thought maybe something like the below just with := can be constructed where this happens too:

    x := []int{1, 2, 3, 4, 5}
    var i int
    for x[i], i = range x {
        //...
        i = 2
    }

Edit: on closer look i guess currently := allows only simple variable names so this is not an issue.

leonklingele commented 6 years ago

As this does not require changes to the for-range syntax, it is still valid Go 1 code. What if one tries to run some Go 2 code using such a for-range loop with a Go 1 compiler? I can already see people complaining "this code works fine in Go 2 but misbehaves for whatever reason in Go 1". This proposal also should add a way to break compilation with Go 1 (e.g. a // +build !go1 annotation) which no one will ever use.

bcmills commented 6 years ago

Ran across this interesting example today (https://play.golang.org/p/DUstLropfJ):

…

func (k *key) f(v string) {
    fmt.Printf("%#v.f(%#v)\n", k, v)
}

func main() {
    …
    for k, v := range m {
        defer k.f(v)
    }
}

The interaction between loop variables and implicit pointer receivers makes it possible to accidentally close over a loop variable without taking its address or writing a function literal.

Merovius commented 6 years ago

For posterity, today someone posted on golang-nuts about stumbling into this with (*testing.T).Parallel: https://groups.google.com/d/msg/golang-nuts/SAZ6wCSLXU0/F-TtRwDCAAAJ As a further datapoint to fix this in Go2.

With regards to the performance-concerns: I'd expect the difference, in general, to be optimized out. Even if there are still edge-cases: Given the subtlety and prevalence of this problem, I'd strongly favor correctness over performance. If your code is too slow, you can always benchmark and hand-optimize. If your code is subtly incorrect, that's a lot harder to remedy.

davecb commented 6 years ago

On 08/05/18 06:13 PM, Bryan C. Mills wrote:

So I see the cost/benefit tradeoff of a breaking change as: Costs Update code generators. Reprogram users' muscle memory. go fix existing code, which may or may not be necessary anyway (depending on what else goes into Go 2).

Benefits Prompt users to omit redundant workarounds. (This proposal.) Make the common / default cases more concise.

It's up to the proposal committee to weigh those costs and benefits; the balance is not obvious to me. The point of this proposal is to lay out some benefits on the “breaking change” side that might not be obvious otherwise.

From previous experience with managed change (which dates back to Multics and dinosaurs roaming the earth (:-)) one of the largest problems of trying to maintain compatibility is that it can lead to persons depending on a bug, and thereby preventing it from ever being fixed.

I would argue that when this kind of deranged dependency occurs, that in order one

introduces the new capability,
provides a means of migration from old to new,
marks the old as deprecated, and
withdraws the old

In the special case of linked-library changes, the caller may have lost the source code they need to change, so we instead made the last step

require positive action to continue to use the old (in our case that was a linker option)

Go's case is not as hard: if you can do steps 1-4, then I strongly recommend you do so.

--dave

iwdgo commented 6 years ago

About the performance loss, I have optimized in various ways the usual Reverse Polish Calculator (https://github.com/iWdGo/postfixcalculator). The best solution is obtained by emulating a Turing machine. The README explains the summary.

To avoid the loop overhead, the counter is directly accessed. This would be criticized by many programming standards. In my view, such a possibility is very useful as the compiler creates very efficient for loops.

The suggestion is to add to the above proposal to evolve the for loop, another case which would be a Turing machine-like loop where re-definition never occurs..

gopherbot commented 5 years ago

Change https://golang.org/cl/164119 mentions this issue: cmd/link: do not close over the loop variable in testDWARF

bronze1man commented 4 years ago

So following code will change to correct if this proposal is accept:

package main

import "fmt"

func main(){
    l:=[]func()int{}
    for i:=0;i<3;i++{
        l = append(l,func()int{
            return i
        })
    }
    for _,f:=range l{
        fmt.Println(f())
    }
}

https://play.golang.org/p/sZQmCpneOY4

danielchatfield commented 4 years ago

I'm very supportive of this change. For code that really needs to be optimised then the allocation can be avoided by declaring the variables outside the loop like this:

var (
    i int
    v string
)
for i, v = range []string{"foo", "bar"} {
    println(i, v)
}

candlerb commented 3 years ago

Last year there was high profile example of this class of bug: https://community.letsencrypt.org/t/2020-02-29-caa-rechecking-bug/114591 https://www.theregister.com/2020/03/03/lets_encrypt_cert_revocation/ Detailed analysis: https://bugzilla.mozilla.org/show_bug.cgi?id=1619047 Fix: https://github.com/letsencrypt/boulder/pull/4690/files#diff-d02067a9f9a2bed1110fd4e98641c2effcf5d1d5f18461e35d6ac1535f6e2c21L1411-R1414

earthboundkid commented 2 years ago

I think this can be done now by keeping it behind a go.mod version flag. When you run go mod edit version -go=1.XX it should scan for existing closure captures and convert them to safe equivalents as part of the upgrade. I think it's doable. E.g. it would change this:

for i := 0; i < n; i++ {
  use(&i)
}

to this as part of the upgrade:

{ 
  i := 0
  for ; i < n; i++ {
    use(&i)
  }
}

thepudds commented 2 years ago

Hi @carlmjohnson, FWIW, the Go language changes design document uses this issue here as an example:

I think the only feasible safe approach is to not permit language redefinitions.

We are stuck with our current semantics. This doesn't mean we can't improve them. For example, for issue 20733, the range issue, we could change range loops so that taking the address of a range parameter, or referring to it from a function literal, is forbidden. This would not be a redefinition; it would be a removal. That approach might eliminate the bugs without the potential of breaking code unexpectedly.

That document also describes using the go directive in go.mod to control the version of the language used by each module (with some additional discussion & refinement in #28221).

So I suspect it is at least possible to remove this as a feature. One minor comment is I suspect go mod edit would be not used to change source code; it would more likely be go fix from what I understand.

mpx commented 1 year ago

FYI, there is a discussion covering how to redefine "for" loop variable semantics on #56010.

rsc commented 1 year ago

Closing as duplicate of #60078.

golang / go

proposal: spec: redefine range loop variables in each iteration #20733