golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.3k stars 17.7k forks source link

cmd/compile: overhaul inliner #61502

Open aclements opened 1 year ago

aclements commented 1 year ago

We’re starting a project to overhaul the inlining optimization pass in the Go compiler, with the goal of having a far more effective inliner in Go 1.22. This issue will track this work. This document motivates this project and outlines our general design goals. More detailed design documents will follow.

This project is being driven by @thanm and @mdempsky, but we would love to have input from the community. Inlining is inherently very heuristic and can be a deep rabbit hole to go down, so I’m sure many of our design decisions will be motivated by pragmatism and trying to ship something better than what we have for Go 1.22, even if it’s not perfect, with the goal of further improving it in later releases.

gopherbot commented 1 year ago

Change https://go.dev/cl/511565 mentions this issue: cmd/compile/internal/inline: add call site flag generation

gopherbot commented 1 year ago

Change https://go.dev/cl/511560 mentions this issue: cmd/compile/internal/ir: export 'reassigned', handle OASOP

gopherbot commented 1 year ago

Change https://go.dev/cl/511559 mentions this issue: cmd/compile/internal/inline: analyze function return properties

gopherbot commented 1 year ago

Change https://go.dev/cl/511555 mentions this issue: internal/goexperiment: add "InlinerRevamp" experiment

gopherbot commented 1 year ago

Change https://go.dev/cl/511566 mentions this issue: cmd/compile/internal/inline/inlheur: assign scores to callsites

gopherbot commented 1 year ago

Change https://go.dev/cl/511564 mentions this issue: cmd/compile/internal/inline: build call site table

gopherbot commented 1 year ago

Change https://go.dev/cl/511562 mentions this issue: cmd/compile: write "properties" to export data for inlinable funcs

gopherbot commented 1 year ago

Change https://go.dev/cl/511557 mentions this issue: cmd/compile/internal/inline: add framework to compute func "properties"

gopherbot commented 1 year ago

Change https://go.dev/cl/511561 mentions this issue: cmd/compile/internal/inline: analyze function param properties

gopherbot commented 1 year ago

Change https://go.dev/cl/511558 mentions this issue: cmd/compile/internal/inline: panic/exit flags analysis for inline heuristics

gopherbot commented 1 year ago

Change https://go.dev/cl/511563 mentions this issue: cmd/compile/internal/inline: extend panic/exit flag computation

gopherbot commented 1 year ago

Change https://go.dev/cl/511556 mentions this issue: cmd/compile: function "property" defs for inl heuristics

josharian commented 1 year ago

Christmas in July!

aclements commented 1 year ago

The 1.22 RC is planned for December, so I think it'll be more like Christmas in December. :smile:

CAFxX commented 1 year ago

Cost-based inline scheduling. [...] For example, a common pattern in Go is to split computations into a small fast path function and a large slow path function, where the fast path is intended to be inlined and calls the large function if the fast path conditions aren’t satisfied.

I would argue that pattern is more of a workaround for the current inliner limitations than a deliberate one aiding readability, so if possible it would be great to avoid ossifying it further. Instead, especially since PGO is now available, I would rather suggest we should go in the direction (not necessarily as part of this overhaul) of supporting also a form of function splitting - so that if the compiler detects what is basically a hot fast path and one or more cold paths in the same function, the fast path is inlined in the callers and the cold path(s) are left where they are. This would allow to curb the proliferation of the xxx/xxxSlow functions in the standard library and elsewhere.

cherrymui commented 1 year ago

Function splitting/outlining is tricker, especially for tracebacks and APIs that converts between function and PC (e.g. runtime.FuncForPC). It is probably solvable, but I'm not sure it is within the scope of this issue.

evanphx commented 1 year ago

@aclements So excited for this work! I'm curious if there is a corpus of code that the team is planning to use to guide the changes (a corpus would probably also function as a test bed).

aclements commented 1 year ago

@evanphx, for static metrics, we're doing some experiments over the Go module cache (that is, basically all public Go modules). We aren't really at performance metrics yet, but the basic corpus we'll use is golang.org/x/benchmarks.

gopherbot commented 1 year ago

Change https://go.dev/cl/517595 mentions this issue: dashboard: add linux-amd64-newinliner builder and trybot

gopherbot commented 11 months ago

Change https://go.dev/cl/549395 mentions this issue: doc: add release note fragment on inlining changes

rsc commented 10 months ago

@aclements asked me to leave this example here. Lots of code using utf8.DecodeRune and friends has manual checks for the ASCII case to avoid the function call. It would be nice to rewrite utf8.DecodeRune and friends to be "outlined": do the ASCII case and then call an unexported general-case function. Then this form would get inlined and we could delete all the manual optimizations. This doesn't work today.

For example here is what we'd like to write:

func NumSmileys(s []byte) int {
    n := 0
    for i := 0; i < len(s); {
        r, size := utf8.DecodeRune(s[i:])
        if r == '☺' {
            n++
        }
    }
    return n
}

but instead people write things like this:

func NumSmileysFast(s []byte) int {
    n := 0
    for i := 0; i < len(s); {
        var r rune
        var s size
        if s[i] < utf8.RuneSelf {
            r, size  = rune(s[i]), 1
        } else {
            r, size = utf8.DecodeRune(s[i:])
        }
        if r == '☺' {
            n++
        }
    }
    return n
}

If we rewrote utf8.DecodeRune to be:

func DecodeRune(s []byte) (r rune, size int) {
    if len(s) > 0 && s[0] < RuneSelf {
        return rune(s[0]), 1
    }
    return decodeRune(s)
}

then the hope is that inlining would turn the NumSmileys code into the NumSmileysFast code instead of us having to do it.

This fails today for at least two reasons.

First, the inlining budgets penalize function calls too much. The budget is 80 and the cost of a call is 57 (!?), with the result that the rewritten DecodeRune comes in at 87. I would suggest something more like 40: if you are doing an outline-call pattern you can spend half what you normally would.

Second, with the budgets changed, the inlining is not as efficient as the original code. The problem is that s[i:] is still computed before doing the fast path, so there are a bunch of memory operations to make this slice value. It would be nice if somehow the compiler recognized that in

t := s[i:]
if len(t) > 0 && t[0] < RuneSelf {
    return rune(t[0]), 1
}

i < len(s) implies len(t) > 0 and also t[0] is just s[i], so the code could delay materializing t entirely until the actual slowpath call.

Generally speaking, making it not necessary to manually optimize these kinds of loops is a good goal for the new inliner.

CAFxX commented 10 months ago

Related: https://github.com/golang/go/issues/48195 https://github.com/golang/go/issues/31666 https://github.com/golang/go/issues/48192

overstep123 commented 7 months ago

Just a small idea: about how to identify hot functions in PGO, whether the total cost of the parent function should be used as the standard (that is, cum_rate), rather than the flat_rate of the child function. The reason is that if the cum_rate of a function is large and the flat_rate is small, then the downstream function of this function will be inline. The benefit will be relatively high (because it is highly probable that this function has many downstream calls, so at least it will save a lot of function call overhead).

michael-obermueller commented 4 months ago

I would like to discuss the topic of better supporting automatic instrumentation frameworks, because they will be affected by the planned Go inliner improvements. One example is eBPF which allows to access user code and variables by analyzing the stack and CPU registers via uprobes. Uprobes do not require Go code changes and are usually installed on function-level to read parameters and return values. However, this requires to be able to predict if a function might get inlined. Only functions which never get inlined are potential candidates for installing uprobes. The opentelemetry-go-instrumentation (How it works) framework is one example which uses eBPF to automatically monitor Go applications. At Dynatrace, we also rely on some functions which might be inlined in the future, for providing detailed insight into Go applications (through an approach different from eBPF).

I'm bringing up this topic, because I think it would be a great opportunity to improve automatic monitoring capabilities of Golang.

objectref commented 3 months ago

Hi. Is this new inliner included in Go 1.23?

jub0bs commented 3 months ago

@objectref No mention of the new inliner in the provisional release notes for 1.23, so I would say no.

objectref commented 3 months ago

@jub0bs Pity. But I would love to know the current state of this, it will be such an improvement!

randall77 commented 3 months ago

The new inliner went out in 1.22. Not sure why this issue is still open, maybe because the inliner is never really done.

aclements commented 3 months ago

In addition to the improvements that @randall77 linked to in Go 1.22, Go 1.23 includes improved stack slot allocation that directly came out of the inlining project. That change is really important for more aggressive inlining, though it happens to lift all boats.

Unfortunately, other than these few changes, our broader plans for the inliner have currently been shelved because of personnel shifts. It's still in the backlog, but we're weighing it against other opportunities (again).

The heuristic changes that can be enabled with the GOEXPERIMENT never produced reliable performance improvements, so they remain off by default, and one of the big next steps here would be to really understand why that is and what can be used from those heuristics, or if we just need a totally different approach to the heuristics.