cmd/compile: compiler can unexpectedly preserve memory

zardlee1937 commented 7 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go1.9.1

Does this issue reproduce with the latest release?

yes. every release.

What operating system and processor architecture are you using (`go env`)?

GOARCH=amd64 GOBIN=C:\Go\bin GOEXE=.exe GOHOSTARCH=amd64 GOHOSTOS=windows GOOS=windows GOPATH=D:\golang GORACE= GOROOT=C:\Go GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 GCCGO=gccgo CC=gcc GOGCCFLAGS=-m64 -fmessage-length=0 -fdebug-prefix-map=C:\Users\zhalei\AppData\Local\Temp\go-build309785515=/tmp/go-build -gno-record-gcc-switches CXX=g++ CGO_ENABLED=1 PKG_CONFIG=pkg-config CGO_CFLAGS=-g -O2 CGO_CPPFLAGS= CGO_CXXFLAGS=-g -O2 CGO_FFLAGS=-g -O2 CGO_LDFLAGS=-g -O2

What did you do?

Write codes for test GC of golang. Here is what i use to test.

package main

import "fmt"
import "time"

type Node struct {
    next     *Node
    payload  [64]byte
}

func main() {
    root := new(Node)
    curr := root
    i := 0

    lastTime := time.Now()
    for {
        currTime := time.Now()
        elapsed := currTime.Sub(lastTime)
        lastTime = currTime
        // 10ms = 10000000ns
        if elapsed > 10000000 {
            fmt.Println("StopTime:", elapsed)
        }

        curr.next = new(Node)
        curr = curr.next

        i++
        //3000 nodes max
        if i >= 3000 {
            i = 0
            root = curr
        }
    }
}

What did you expect to see?

the program run well.

What did you see instead?

memory never be released until everything stop work. and then, i take my pc power off. -_-

davecheney commented 7 years ago

Here is a smaller reproduction which grows linearly on my machine

package main

type Node struct {
        next    *Node
        payload [64]byte
}

func main() {
        curr := new(Node)
        for {
                curr.next = new(Node)
                curr = curr.next
        }
}

mattn commented 7 years ago

I think this is not a bug since curr.next refer the pointer even if the variable is updated. This is not weak refernece.

davecheney commented 7 years ago

@mattn i'm sorry I don't understand what you mean. I've stared at this code a bunch, and it looks like to me that on each iteration curr is replaced with curr.next, so the previous value of curr is no longer referenced.

  +------+---------+
  + next | payload |
  +------+---------+
       |
       |    +------+---------+
curr ->`--> | next | payload |
            +------+---------+

And so on.

I'm sure i'm missing something, so I'd appreciate someone helping me understand what I'm missing.

ianlancetaylor commented 7 years ago

At least @davecheney 's version is an interesting result of compiler optimization. The very first call to new(Node) never escapes, so the compiler allocates it on the stack. It's next pointer is set to point to the next allocated Node. The Node chain continues to build, but nothing ever clears the the stack allocated pointer, so all the Nodes appear live to the garbage collector.

Basically 1) the fact that you use the same Go variable doesn't mean that the compiler does the same; 2) the fact that you call new doesn't mean that you always get heap memory.

Not sure whether or how to fix this. Does it come up in a real program?

CC @randall77

mattn commented 7 years ago

As long as curr.next holds the reference of *Node (i.e. strong reference), it will not be freed, I belieave. So if make it gc, Node should have uintptr instead of *Node.

package main

import (
    "runtime"
    "unsafe"
)

type Node struct {
    next    uintptr
    payload [64]byte
}

func main() {
    curr := new(Node)
    for {
        curr.next = uintptr(unsafe.Pointer(new(Node)))
        curr = (*Node)(unsafe.Pointer(curr.next))
        runtime.GC()
    }
}

Of course this is a dangerous code. Because the value indicated by next may be a pointer already freed.

davecheney commented 7 years ago

@mattn

As long as curr.next holds the reference of *Node (i.e. strong reference), it will not be freed,

But curr is overwritten every loop, so on the next loop, curr.next is actually curr.next.next and so on. The previous value held in curr is now unreferenced so even though it's next field points to the current curr nothing points to it, the previous value of curr; ergo, it's garbage.

davecheney commented 7 years ago

@ianlancetaylor thanks for the explanation. I think, irrespective of if this comes up in real code or not, it's still quite a serious bug.

mattn commented 7 years ago

@davecheney curr is overwritten but not overwritten the memory block pointed by curr. pointer is a reference just not a value. if something refer the memory block, GC can sweep for marking.

mattn commented 7 years ago

@davecheney It GC collect the unreferenced-value immediately as you mentioned, following code can not make linked-list.

package main

type Node struct {
    next     *Node
    payload  [64]byte
}

func NewNode(curr *Node) *Node {
    newnode := new(Node)
    newnode.next = curr
    return newnode
}

func main() {
    curr := NewNode(nil)
    curr = NewNode(curr)
    curr = NewNode(curr)
}

davecheney commented 7 years ago

On 20 Oct 2017, at 17:00, mattn notifications@github.com wrote:

@davecheney curr is overwritten but not overwritten the memory block pointed by curr. pointer is a reference just not a value. if something refer the memory block, GC can sweep for marking.

That’s what I’m saying, curr = curr.next means that the value previously pointed to by curr is now no longer visible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davecheney commented 7 years ago

On 20 Oct 2017, at 17:07, mattn notifications@github.com wrote:

@davecheney It GC collect the unreferenced-value immediately as you mentioned, following code can not make linked-list.

This is a valid linked list. The previous value of curr is captured in the next field of the value returned from NewNode. package main

type Node struct { next *Node payload [64]byte }

func NewNode(curr Node) Node { newnode := new(Node) newnode.next = curr return newnode }

func main() { curr := NewNode(nil) curr = NewNode(curr) curr = NewNode(curr) } — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mattn commented 7 years ago

@davecheney do you mean we can't make linked-list with following code?

        curr := new(Node)
    for i := 0; i < 5; i++ {
                curr.next = new(Node)
                curr = curr.next
        }
        // now curr is top of linked-list

davecheney commented 7 years ago

Yes, that is a linked list, however on every iteration, curr points to the tail of the linked list, and curr's next pointer is nil. So, it's not a very useful linked list.

But more importantly, this bug is about the previous chains in the linked list, curr's predecessors. They are garbage because no live reference points to them. However, because of escape analysis the original location of curr on the stack remains live, keeping the entire linked list alive. This is a bug.

On Fri, Oct 20, 2017 at 4:30 PM, mattn notifications@github.com wrote:

@davecheney https://github.com/davecheney do you mean we can't make linked-list with following code?
    curr := new(Node)
for i := 0; i < 5; i++ { curr.next = new(Node) curr = curr.next } // now curr is top of linked-list

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/22350#issuecomment-338117972, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAcA0ZNPc5iHQpOXsGGnzYy7ar3GFVUks5suD3zgaJpZM4QAIYl .

mattn commented 7 years ago

No. this is not a bug. This works as intended. And changing this behavior is language change.

See https://golang.org/src/container/ring/ring.go#L61

davecheney commented 7 years ago

I'm sorry, I don't understand what you are suggesting with that link. I don't think there is anything further I can add to this discussion. Thank you for your time.

On Fri, Oct 20, 2017 at 5:35 PM, mattn notifications@github.com wrote:

No. this is not a bug. This works as intended. And changing this behavior is language change.

See https://golang.org/src/container/ring/ring.go#L61

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/22350#issuecomment-338129381, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAcA7rv1LSLcA_4TyxVC2iLGdW6YdRxks5suE1HgaJpZM4QAIYl .

anydream commented 7 years ago

https://golang.org/src/container/ring/ring.go#L69 p.next = &Ring{prev: p}

@mattn Have you ever seen {prev: p}?

mattn commented 7 years ago

As far as i know, this is usual/general way to make linked list which have size.

anydream commented 7 years ago

@mattn Your example is doubly linked-list, but the bug of this issue is singly linked-list. They are completely different Nodes of singly linked-list doesn't reference the previous one. So your example is unreasonable.

mattn commented 7 years ago

p = p.next

In your thought, p.next.prev should be garbage-collected?

BTW, how garbages will be corrected? update curr.next.next or curr.next to nil? I don't know such a behavior in other langues. For example, in Java, WeakReference have accessor to get weak-referenced value.

anydream commented 7 years ago

@mattn p = p.next After that operation, the previous Node cannot be referenced. Allocated memories that cannot be referenced is garbage.

             +------+---------+
*before* p:  + next | payload | <-- This is garbage!
             +------+---------+
                |
                |    +------+---------+
*after*  p:     `--> | next | payload |
                     +------+---------+

mattn commented 7 years ago

Ah, Sorry, I was confused. :(

This code doesn't have top of the chain. So un-referenced curr should be garbage-collected.

cznic commented 7 years ago

@mattn

As Ian pointed out, the first Node is allocated on stack. You cannot garbage collect stack memory, but that stack-allocated Node references transitively all other Nodes allocated later in heap.

mattn commented 7 years ago

package main

type Node struct {
    next    *Node
    payload [64]byte
}

func NewNode() *Node {
    return &Node{}
}

func main() {
    curr := NewNode()
    for {
        curr.next = NewNode()
        curr = curr.next
    }
}

This is replaced inline(stack) operation?

cznic commented 7 years ago

I don't know, it's different code. I think there's a compiler flag that tells what's escaping or not, so you can test both versions and look for differences, if any.

RLH commented 7 years ago

I must admit this is a particularly evil since it is an example of better escape analysis resulting in more objects being retained. Whether this is a "bug bug" or simply the result of the conservative nature of the GC is an open question, I tend to think the later. Was this seen in the wild?

This paper discusses the problem and holds some interesting observations and offers some possible solutions.

Ole Agesen, David Detlefs, and J. Eliot Moss. 1998. Garbage collection and local variable type-precision and liveness in Java virtual machines. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation (PLDI '98), A. Michael Berman (Ed.). ACM, New York, NY, USA, 269-279. DOI: https://doi.org/10.1145/277650.277738

On Fri, Oct 20, 2017 at 1:26 AM, Dave Cheney notifications@github.com wrote:

@mattn https://github.com/mattn i'm sorry I don't understand what you mean. I've stared at this code a bunch, and it looks like to me that on each iteration curr is replaced with curr.next, so the previous value of curr is no longer referenced.

+------+---------+
next payload +------+---------+

+------+---------+

curr ->`--> next payload
    +------+---------+
And so on.

I'm sure i'm missing something, so I'd appreciate someone helping me understand what I'm missing.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/22350#issuecomment-338109349, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7Wn9uWhbHwB3MVQvrQmP_b0G9H3Y4Gks5suC8HgaJpZM4QAIYl .

next	payload	+------+---------+
+------+---------+
curr ->`-->	next	payload

randall77 commented 7 years ago

Yes, this looks pretty unfortunate.

We might consider disabling stack allocation for x := new(...) if there is any reassignment of x elsewhere in the function (or just in a loop?). That seems pretty harsh, though, and is liable to cause more problems than it solves.

cznic commented 7 years ago

We might consider disabling stack allocation for x := new(...) if there is any reassignment of x elsewhere in the function (or just in a loop?). That seems pretty harsh, though, and is liable to cause more problems than it solves.

I think this is actually the correct solution. Keeping truly unreachable memory allocations is much more of a problem, IMHO.

griesemer commented 7 years ago

Shouldn't this be considered a liveness issue (bug)? If I understand this correctly, using @davecheney simpler example, curr gets split into into variables as a side-effect of the escape analysis, one of which really is not live anymore after the 2nd assignment to curr, but that information is somehow lost when translating to SSA. So perhaps there needs to be some explicit "kill" information flowing into the construction of the SSA when a so-duplicated variable becomes dead. Presumably this situation can happen anytime a variable containing pointers gets "duplicated".

ianlancetaylor commented 7 years ago

CC @dr2chase

randall77 commented 7 years ago

There isn't any variable splitting going on here. There's an autotmp variable allocated for the initial new allocation. curr is set to point to that autotmp. The problem is that curr and the autotmp are live on entry to the loop. The autotmp needs to be live until the curr = curr.next statement. The crux of the problem (assuming we can't avoid it by not allocating on the stack) is that there's no way to distinguish "first time around" from "subsequent times around" the loop. (Unless we peel off the first iteration.)

griesemer commented 7 years ago

@randall77 I suppose one could assign nil to that autotmp after use, always, but I guess that would be too costly?

mdempsky commented 7 years ago

The autotmp needs to potentially be kept alive even after the curr = curr.next assignment if it first gets copied into another non-escaping live value.

It seems like the crux of the issue is during escape analysis we currently only worry about whether a variable's lifetime can exceed the function call. However, if it contains pointers, we really need to statically know exactly when it's still live throughout the function, otherwise moving it to the stack potentially interferes with GC.

mdempsky commented 7 years ago

Was this seen in the wild?

I'd like to reiterate this question. This seems like a really difficult bug to fix properly, but I'd be surprised if it affects real-world applications.

cznic commented 7 years ago

Seems like at least the OP ran into it for real.

mdempsky commented 7 years ago

@cznic OP's bug report says "Write codes for test GC of golang." That reads to me like they were specifically writing programs to test the behavior of Go's GC behavior (in the same way that #15550 or #22050 were programs I constructed just to test compiler behavior, not related to any actual programs), but maybe I'm misinterpreting that.

davecheney commented 7 years ago

@mdempsky I agree that the OP's code is contrived. And it is tempting to hope that this will never happen in real code, like many of the trivial examples that trigger pathological edge cases in the runtime we see reported from time to time. However given the cause is a dangling gc stack root, I think this should not be written off as not happening in real code because, we just don't know.

cznic commented 7 years ago

I'd say that programs testing the correctness of the compiler/runtime are real and useful programs per se.

dr2chase commented 7 years ago

@cznic I think the question is "is this bug costing anyone any money, either in terms of memory wasted, processor time wasted, latency increased, reliability reduced, or security vulnerability?" If the answer is "no", it has a lower priority than the bugs where the answer is "yes". And if the answer is "yes", there are still additional questions to answer.

That said, it's a real bug, and as a test the program is useful. On the other hand, this is a known risk of any heap-to-stack allocation optimizations, going back decades.

dr2chase commented 7 years ago

Two different workarounds, for the original program.

package main

import "fmt"
import "time"

type Node struct {
    next    *Node
    payload [64]byte
}

var sink *Node // Workaround #1, part 1 of 3

func main() {
    root := new(Node)
    sink = root   // Workaround #1, part 2 of 3
    sink = nil    // Workaround #1, part 3 of 3
    root0 := root // Workaround #2, part 1 of 2
    curr := root
    i := 0

    lastTime := time.Now()
    for {
        currTime := time.Now()
        elapsed := currTime.Sub(lastTime)
        lastTime = currTime
        // 10ms = 10000000ns
        if elapsed > 10000000 {
            fmt.Println("StopTime:", elapsed)
        }

        curr.next = new(Node)
        curr = curr.next

        i++
        //3000 nodes max
        if i >= 3000 {
            i = 0
            root = curr
            root0.next = nil // Workaround #2, part 2 of 2
        }
    }
}

mdempsky commented 7 years ago

Google's Go compiler and runtime folks met earlier today and we discussed this issue at some length. At the moment, we have some brainstorm-y ideas on how to address this (see below), but they all seem very expensive and would negatively impact other Go programs. For now, we're going to keep this issue open until we have examples of this affecting real programs without easy workaround.

A few options tossed around:

Always heap allocate variables that contain pointers and whose address are taken. This is safe and trivial to implement, but would drastically hinder the effectiveness of escape analysis.
Unified escape analysis and liveness tracking. If we could precisely determine a stack-allocated object's lifetime, we could set the stackmaps correctly. Then only ambiguously-live objects need to be heap allocated. However, 1) escape analysis and liveness tracking are already two of the more complicated and subtle parts of the compiler, and 2) they run at entirely different stages of the compiler (escape analysis is before SSA begins, liveness is after SSA ends), so it's unclear how or even if this would work.
Partial GC scanning of stacks. Instead of recording stack autotmps' liveness in stackmaps, we could leave it up to GC to identify them and trace them out. This would require the compiler to generate additional stackmap-like data structures, and additional GC algorithms for scanning stacks.

davecheney commented 7 years ago

Thanks for the update Matthew.

On Wed, Oct 25, 2017 at 9:55 AM, Matthew Dempsky notifications@github.com wrote:

Google's Go compiler and runtime folks met earlier today and we discussed this issue at some length. At the moment, we some brainstorm-y ideas on how to address this (see below), but they all seem very expensive and would negative impact other Go programs. For now, we're going to keep this issue open until we have examples of this affecting real programs without easy workaround.

A few options tossed around:

1.

Always heap allocate variables that contain pointers and whose address are taken. This is safe and trivial to implement, but would drastically hinder the effectiveness of escape analysis. 2.

Unified escape analysis and liveness tracking. If we could precisely determine a stack-allocated object's lifetime, we could set the stackmaps correctly. Then only ambiguously-live objects need to be heap allocated. However, 1) escape analysis and liveness tracking are already two of the more complicated and subtle parts of the compiler, and 2) they run at entirely different stages of the compiler (escape analysis is before SSA begins, liveness is after SSA ends), so it's unclear how or even if this would work. 3.

Partial GC scanning of stacks. Instead of recording stack autotmps' liveness in stackmaps, we could leave it up to GC to identify them and trace them out. This would require the compiler to generate additional stackmap-like data structures, and additional GC algorithms for scanning stacks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/22350#issuecomment-339160962, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAcAy3iUZaZhJTyHvVkgck-XuWQ3ai0ks5svmr4gaJpZM4QAIYl .

cznic commented 7 years ago

@dr2chase

I think the question is "is this bug costing anyone any money, either in terms of memory wasted, processor time wasted, latency increased, reliability reduced, or security vulnerability?" If the answer is "no", it has a lower priority than the bugs where the answer is "yes". And if the answer is "yes", there are still additional questions to answer.

There are several other criteria wrt prioritizing bug fixes, like the impact on correctness, performance, convenience, the understanding of the problem and the complexity of the fix, etc. Of those, correctness should be the first one to consider, even before performance. (Premature optimization is the ..., -- D.K.)

My non-engineering opinion: If the optimization, in this case allocation made on stack, breaks correctness, the first step is to revert the change to restore correctness. The optional, but desired next step is to fix the optimization to not break correctness and apply it again while, of course, adding a test case for the previous fail scenario.

Wrt the complexity of the fix, others has commented, but let me point out @randall77's proposal, which is IMO simple and preserves correctness: Disqualify all reassigned pointers from pointing to stack allocations. I guess implementing this is rather easy. There will be some impact on performance, but hopefully not substantial, because it will still keep the optimization in all other cases.

BTW, I'd like to say that the bug is no one's fault. Such bugs are part of the cost of making progress.

davecheney commented 7 years ago

I'd like to echo @cznic 's comment. It's something that I realise that I don't say enough, but Jan puts it very well

BTW, I'd like to say that the bug is no one's fault. Such bugs are part of the cost of making progress.

My persistence with this issue is solely dedicated to correctness. It's important to me that I know the gc and the liveness information is correct.

I have a growing concern that this issue may be at the root of the occasional--but taken over the span of eight years, substantial--reports of unexplainable memory usage. Over time, as I have become more confident in the correctness of the gc, specifically it's preciseness, I've moved from viewing these issues as a bug in the runtime (as I did in the 1.0 and thereabouts days) to a symptom of an unknown bug in the reporters code (usually aided by suspicions of data races or enthusiastic cgo usage). This bias is usually enforced by an inability for the reporter to provide a concise reproduction, but in this case they have, and so I think it's important, in the interests of debugging this, and other, gc related issues that we can say confidently this is not a known issue with the liveness information.

Again, thank you to everyone who has worked on this bug. I sincerely appreciate it.

ianlancetaylor commented 7 years ago

We need to be careful about using the word "correctness" here. Even with this bug, the program is still by some definition correct. And it takes a carefully written special case for this bug to manifest as using up all of memory. Of course we all agree that correctness takes priority over performance, but this program is not quite behaving incorrectly. It just has terrible and unacceptable side effects. It still must be fixed somehow, of course.

The solutions we've thought of so far would have drastic performance implications, and would cause far more bugs to be filed than this one. @cznic you suggest that we adopt @randall77 's suggestion, but he himself says that that "is liable to cause more problems than it solves."

Of course, we can't think of everything. Please do keep making suggestions for ways to address this.

@davecheney This bug could certainly cause some memory to be unexpectedly held across a loop. Even if we fix this bug, there will be other ways that that can happen. It's true today, and it will be true after this bug is fixed, that there will be cases where the program will appear to have no references to some object but there will be live references that aren't clearly visible in the program's source. Think of it as the reverse of the problem described in the runtime.SetFinalizer docs. There isn't a precise match between liveness in the source code and liveness in the generated code. Normally, that is fine. It takes a special case like this one to expose a problem.

cznic commented 7 years ago

@ianlancetaylor

@cznic you suggest that we adopt @randall77 's suggestion, but he himself says that that "is liable to cause more problems than it solves."

@randall77's quote is definitely a respected expert's opinion, well educated guess and so on, but not a experiment/measurement.

Can perhaps someone please point me to the commit which introduced stack allocations for non-escaping foo := new(T) and/or bar := &T{...}?

randall77 commented 7 years ago

Yes, "likely to cause more problems than it solves" is just a hunch. Don't treat that statement as gospel. We have stack allocated non-escaping new(T) and &T{...} forever. Probably since 1.0, definitely since 1.2.2, the earliest version I have at hand.

ianlancetaylor commented 7 years ago

@cznic This probably won't be particularly helpful, but the commit that introduced stack allocation for non-escaping calls to new, and non-escaping composite literals, is https://golang.org/cl/4954043 .

ianlancetaylor commented 7 years ago

One way to view this class of problems is that a pointer P to a value V is allocated on the stack, but that P dies and nothing else points to V, so V should die. If V were allocated on the heap all would work as programmers intuitively expect, but because V is allocated on the stack it in effect remains live until the function returns.

So the first question is: can we detect that there are no pointers to V? I believe the answer may be yes. On the assumption that pointers to the stack can only be found on the stack, we could build a liveness map of the stack during GC stack scanning, and use that to find all values on the stack that are dead. This would require the stack map to record not only pointer positions, but also object sizes, since liveness is per-object, not per-pointer. I think we could do this by using two bits per word in the stack map, as opposed to the single bit we currently use.

The second question: is there anything we can do? The answer there is definitely yes: when we find that V is dead, we can zero it out.

If this is possible, then the obvious downside is slower stack scanning. I think we will only defeat programmer intuition with objects that appear to be heap allocated, but are stack allocated because they do not escape, that themselves contain pointers into the heap. Perhaps the compiler can mark these cases such that the runtime only has to do this special scanning if they exist.

hyangah commented 7 years ago

I recently encountered an issue of increased memory usage after minor refactoring of existing code using a closure and @mdempsky pointed me to this issue. So I am adding my example here.

https://play.golang.org/p/zA9iHzDIYi

Basically, the function closure here that contains a reference to the byte slice is stack allocated, so the byte slice remains alive until G returns. GC makes no guarantees about when the memory is garbage collected, thus it's not incorrect behavior. But it's rather surprising and it was hard to diagnose.

cznic commented 7 years ago

GC makes no guarantees about when the memory is garbage collected, thus it's not incorrect behavior.

Then you can just eliminate the GC completely. Can such language be still called "garbage collected"? I don't think so. IMO, a garbage collected language rutnime must guarantee at minimum that OOM will not happen when there's enough of non reachable allocations to reclaim that can satisfy the current allocation request.

edit: typos

golang / go