golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.94k stars 17.53k forks source link

cmd/compile/internal/gc: esc.go duplicated sink paths for re-assignments #27003

Open quasilyte opened 6 years ago

quasilyte commented 6 years ago

Given this code:

/* 1*/ package example
/* 2*/ 
/* 3*/ var sink interface{}
/* 4*/ 
/* 5*/ func fn() {
/* 6*/  var x *int
/* 7*/  x = new(int)
/* 8*/  sink = x // Sink 1 at the line 8; first new(int) flows here
/* 9*/  x = new(int)
/*10*/  sink = x // Sink 2 at the line 10; second new(int) flows here
/*11*/ }

(Code annotated with line number for convenience, minimal example outlined after the issue description.)

Execute the command $ go tool compile -m=2 example.go. The output on tip is (important bits are in bold):

example.go:5:6: can inline fn as: func() { var x *int; x = ; x = new(int); sink = x; x = new(int); sink = x }
example.go:8:7: x escapes to heap
example.go:8:7:     from sink (assigned to top level variable) at example.go:8:7
example.go:7:9: new(int) escapes to heap
example.go:7:9:     from x (assigned) at example.go:7:4
example.go:7:9:  from x (interface-converted) at example.go:8:7
example.go:7:9:     from sink (assigned to top level variable) at example.go:8:7
example.go:9:9: new(int) escapes to heap
example.go:9:9:     from x (assigned) at example.go:9:4
example.go:9:9:  from x (interface-converted) at example.go:8:7
example.go:9:9:     from sink (assigned to top level variable) at example.go:8:7
example.go:10:7: x escapes to heap
example.go:10:7:    from sink (assigned to top level variable) at example.go:10:7

Expected output would not report second sink to the same location.

This is a consequence of how graph is constructed and printed.

Simplified example:

package example

var sink *int

func fn() {
    var x *int
    x = new(int)
    sink = x
    x = new(int)
    sink = x
}

And the output:

example.go:5:6: can inline fn as: func() { var x *int; x = ; x = new(int); sink = x; x = new(int); sink = x }
example.go:7:9: new(int) escapes to heap
example.go:7:9:  from x (assigned) at example.go:7:4
example.go:7:9:     from sink (assigned to top level variable) at example.go:8:7
example.go:9:9: new(int) escapes to heap
example.go:9:9:  from x (assigned) at example.go:9:4
example.go:9:9:     from sink (assigned to top level variable) at example.go:8:7

Expected output:

example.go:5:6: can inline fn as: func() { var x *int; x = ; x = new(int); sink = x; x = new(int); sink = x }
example.go:7:9: new(int) escapes to heap
example.go:7:9:  from x (assigned) at example.go:7:4
example.go:7:9:     from sink (assigned to top level variable) at example.go:8:7
example.go:9:9: new(int) escapes to heap
example.go:9:9:  from x (assigned) at example.go:9:4
example.go:9:9:     from sink (assigned to top level variable) at example.go:10:7

For that example, we have something like this:

sink.Flowsrc == { {dst=sink, src=x, where=line8}, {dst=sink, src=x, where=line10} }
x.Flowsrc == { {dst=x, src=new(int), where=line7 }, {dst=x, src=new(int), where=line9} }

During traversal in escwalkBody, both new(int) paths are printed using first sink destination as a parent, so we get two same paths. For the second destination nothing is printed due to osrcesc variable check that is used to avoid duplicated messages.

If osrcesc is removed, both paths are printed twice (so, 4 messages instead of 2, but 2 of them are correct). Currently, osrcesc leads to 2 messages, 1 of which is incorrect.

It's not enough to just check whether destination node located before the actual starting flow point because of recursive functions:

func f8(x int, y *int) *int {
    if x <= 0 {
        return y
    }
    x--
    return f8(*y, &x)
}

Here the return y is a destination endpoint, and it comes before the tracked &x.

I have no good ideas on how to fix this one. Hopefully, insights above can help someone to roll the solution to this.

quasilyte commented 6 years ago

Don't know who to CC. Maybe @josharian? (Just for issue validation; perhaps this is known and accepted limitation of the esc.go explainer.)

josharian commented 6 years ago

@dr2chase is primary for escape analysis (AFAIK); he wrote the escape analysis explainer. The other person that comes to mind is @cherrymui.

cherrymui commented 6 years ago

It seems to me a deeper issue is that the analysis of data flow is on the node level, where all occurrences of x are represented with the same node. For example, if I remove the second sink=x assignment, i.e.

package example

var sink *int

func fn() {
    var x *int
    x = new(int)
    sink = x
    x = new(int)
    _ = x
}

The second new(int) does not necessarily escape. With the current analysis, it does escape, because the new(int) flows to x, and x, at some point, flows to heap. In the original program, it doesn't really matter when/where x flows to heap; it just picks one.

For this, I think that the analysis would need to work on a level that tracks the order of data flow (probably some form of SSA). And more accurate location report would follow naturally.

quasilyte commented 6 years ago

The second new(int) does not necessarily escape. With the current analysis, it does escape, because the new(int) flows to x, and x, at some point, flows to heap. In the original program, it doesn't really matter when/where x flows to heap; it just picks one.

I actually tried to make exactly this, to teach escape analysis to recognize that second new(int) is unnecessary, but encountered unexpected debug output that made me wonder.