golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.84k stars 17.51k forks source link

cmd/compile: PGO fails to do multiple levels of inlining into a single function #69046

Open prattmic opened 2 weeks ago

prattmic commented 2 weeks ago

Reproducer:

-- dep/dep.go --
package dep

var sink int

//go:noinline
func spin() {
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

func b() {
        // Two calls, too hot to inline, darn...
        spin()
        spin()
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

func a() {
        // Two calls, too hot to inline, darn...
        b()
        b()
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

//go:noinline
func Foo() {
        a()
}
-- dep/dep_test.go --
package dep

import "testing"

func BenchmarkMain(b *testing.B) {
        for i := 0; i < b.N; i++ {
                Foo()
        }
}
-- main.go --
package main

import "example.com/pgo/dep"

func main() {
        dep.Foo()
}
-- go.mod --
module example.com/pgo

go 1.23

https://go.dev/play/p/QxBt13wV3Uh

Run the benchmark to collect a profile, and then use that as a PGO profile:

$ go test -bench . -cpuprofile=cpu.pprof ./dep
$ go build "-gcflags=example.com/pgo/dep=-m=2 -d=pgodebug=1" -pgo=cpu.pprof                         
# example.com/pgo/dep
hot-callsite-thres-from-CDF=0.6493506493506493
dep/dep.go:6:6: cannot inline spin: marked go:noinline
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.b
dep/dep.go:12:6: can inline b with cost 133 as: func() { spin(); spin(); for loop }
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:21:6: can inline a with cost 285 as: func() { b(); b(); for loop }
dep/dep.go:31:6: cannot inline Foo: marked go:noinline
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
dep/dep.go:23:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:24:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a

The hot-budget lines note that PGO allows inlining b into a and a into Foo. dep/dep.go:32:3: inlining call to a indicates that a was inlined into Foo, but there should then be a subsequent inline b into Foo. The output makes this a bit confusing, but objdump makes it clear that b was not inlined:

0000000000467a80 <example.com/pgo/dep.Foo>:
...
  467a8b:       |  |         e8 b0 ff ff ff             call   467a40 <example.com/pgo/dep.b>
  467a90:       |  |         e8 ab ff ff ff             call   467a40 <example.com/pgo/dep.b>
...

A bit of custom logging in the compiler makes it more clear what is happening:

dep/dep.go:31:6: function Foo: DevirtualizeAndInlineFunc
dep/dep.go:31:6: function Foo: TryInlineCall to a
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a
dep/dep.go:31:6: function Foo: TryInlineCall to b
dep/dep.go:31:6: canInlineCallExpr from Foo to b: cost too high: maxCost 80, callSiteScore 133, hot false
dep/dep.go:31:6: mkinlcall b: !canInlineCallExpr

It looks like what is happening here is:

  1. a in inlined into Foo.
  2. We then attempt to inline b into Foo (from the inlined body of a).
  3. canInlineCallExpr checks inlineCostOK of Foo -> b. This consults the PGO edge map, which does not contain a hot edge from Foo -> b because PGO includes inline calls in edges. i.e., it has hot edges Foo -> a and a -> b.
  4. Inline does not occur because we didn't increase the budget and base cost is too high.

I believe what should happen is if inlining a call inside of an ir.InlinedCallExpr, we should use the original name of the InlinedCallExpr to consult the PGO edge map.

cc @cherrymui

gabyhelp commented 2 weeks ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

prattmic commented 2 weeks ago

I believe what should happen is if inlining a call inside of an ir.InlinedCallExpr, we should use the original name of the InlinedCallExpr to consult the PGO edge map.

Getting the original name looks somewhat painful, as it looks like the only reference we keep to it is a reference to the inline tree via an inline mark statement in the expression init: https://cs.opensource.google/go/go/+/master:src/cmd/compile/internal/noder/reader.go;drc=4e1cc09f8b9bcef2b6d0839a7d0026b50c21998b;bpv=1;bpt=1;l=3513