golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.89k stars 17.65k forks source link

cmd/compile: internal compiler error: panic: interface conversion: types.Object is nil, not *ir.Name #67746

Closed Cylkal closed 4 months ago

Cylkal commented 5 months ago

Go version

go version go1.22.3 linux/amd64

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOROOT='/root/go'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/root/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/root/code/test/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1081386337=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I have a project that uses ClickHouse, and it compiles successfully. However, when the pprof samples collected online service are placed in the main package of default.pgo, the compilation fails.

I apologize for not being able to provide the source code directly, but here are the pprof sample file default.pgo and the code main.go that can reproduce the compilation error.

ls

default.pgo  go.mod  go.sum  main.go
// main.go
// don't need to run
package main

import (
    "context"
    _ "net/http/pprof"

    "github.com/ClickHouse/clickhouse-go/v2"
)

func main() {
    cfg, err := clickhouse.ParseDSN("clickhouse://user:password@127.0.0.1:19000/default")
    if err != nil {
        panic(err)
    }
    conn, err := clickhouse.Open(cfg)
    if err != nil {
        panic(err)
    }
    defer conn.Close()
    b, err := conn.PrepareBatch(
        context.Background(),
        "INSERT INTO some_table (x) VALUES (?)",
    )
    if err != nil {
        panic(err)
    }
    b.Append(1)
    b.Send()
}
// go.mod
module test

go 1.22.3

require (
    github.com/ClickHouse/ch-go v0.61.5 // indirect
    github.com/ClickHouse/clickhouse-go/v2 v2.25.0
)

require (
    github.com/andybalholm/brotli v1.1.0 // indirect
    github.com/go-faster/city v1.0.1 // indirect
    github.com/go-faster/errors v0.7.1 // indirect
    github.com/google/uuid v1.6.0 // indirect
    github.com/klauspost/compress v1.17.7 // indirect
    github.com/kr/text v0.2.0 // indirect
    github.com/paulmach/orb v0.11.1 // indirect
    github.com/pierrec/lz4/v4 v4.1.21 // indirect
    github.com/pkg/errors v0.9.1 // indirect
    github.com/segmentio/asm v1.2.0 // indirect
    github.com/shopspring/decimal v1.4.0 // indirect
    go.opentelemetry.io/otel v1.26.0 // indirect
    go.opentelemetry.io/otel/trace v1.26.0 // indirect
    golang.org/x/sys v0.18.0 // indirect
    gopkg.in/yaml.v3 v3.0.1 // indirect
)

What did you see happen?

go build

# github.com/ClickHouse/clickhouse-go/v2/ext
/root/go/pkg/mod/github.com/!click!house/ch-go@v0.61.5/proto/col_nullable.go:20:6: internal compiler error: panic: interface conversion: types.Object is nil, not *ir.Name

Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new

When the PGO file is removed, i.e., when PGO is disabled, it compiles successfully.

The PGO file comes from my online service, please rename the file profile.pb.gz to default.pgo. profile.pb.gz

What did you expect to see?

Build pass when enable PGO

cuonglm commented 5 months ago

Interestingly, this is fixed at tip after https://go-review.googlesource.com/c/go/+/578815

How was your default.pgo generated?

Cylkal commented 5 months ago

@cuonglm It was collected from my online service by using curl -o pprof.profile http://localhost:7999/debug/pprof/profile, and the file has been processed. For example, using '--focus clickhouse' on the original sample file outputs only the samples related to ClickHouse. Of course, the original sample file can also trigger this issue.

I tried writing a separate demo program to perform batch insert operations into ClickHouse and found that this issue does not occur. This problem only appears when collecting samples from the online service.

mknyszek commented 5 months ago

CC @golang/compiler

thanm commented 5 months ago

@cuonglm I think you mean fixed by https://go.dev/cl/581436, no?

cuonglm commented 5 months ago

@cuonglm I think you mean fixed by https://go.dev/cl/581436, no?

No, because go1.22.3 includes this CL but still trigger ICE.

Unless my operation was wrong, my git bisect points to https://go-review.googlesource.com/c/go/+/578815 which does fix this issue.

thanm commented 5 months ago

@cuonglm thanks for that clarification, I hadn't realized that we backported 581436. Interesting.

thanm commented 5 months ago

Stack trace from go build -gcflags=all=-d=panic=1:

goroutine 1 [running]:
runtime/debug.Stack()
    runtime/debug/stack.go:24 +0x5e
cmd/compile/internal/base.FatalfAt({0x54f3e8?, 0xc0?}, {0xdf1b88, 0x9}, {0xc00054f418, 0x1, 0x1})
    cmd/compile/internal/base/print.go:225 +0x1d7
cmd/compile/internal/base.Fatalf(...)
    cmd/compile/internal/base/print.go:194
cmd/compile/internal/gc.handlePanic()
    cmd/compile/internal/gc/main.go:52 +0x90
panic({0xd722c0?, 0xc000c10f30?})
    runtime/panic.go:770 +0x132
cmd/compile/internal/escape.(*batch).initFunc(...)
    cmd/compile/internal/escape/escape.go:199
cmd/compile/internal/escape.Batch({0xc000c0e638, 0x1, 0xc000a339e0?}, 0x0?)
    cmd/compile/internal/escape/escape.go:139 +0x19de
cmd/compile/internal/ir.(*bottomUpVisitor).visit(0xc000c10480, 0xc000a339e0)
    cmd/compile/internal/ir/scc.go:121 +0x2e2
cmd/compile/internal/ir.(*bottomUpVisitor).visit.func1({0xf64760?, 0xc000a339e0?})
    cmd/compile/internal/ir/scc.go:78 +0x3b
cmd/compile/internal/ir.(*bottomUpVisitor).visit.func2({0xf64460, 0xc000be2ba0})
    cmd/compile/internal/ir/scc.go:92 +0xf7
cmd/compile/internal/ir.Visit.func1({0xf64460, 0xc000be2ba0})
    cmd/compile/internal/ir/visit.go:105 +0x2d
cmd/compile/internal/ir.(*CallExpr).doChildren(0xc000bdcfa0, 0xc000c20270)
    cmd/compile/internal/ir/node_gen.go:298 +0x9c
cmd/compile/internal/ir.DoChildren(...)
    cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf646a0, 0xc000bdcfa0})
    cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.doNodes(...)
    cmd/compile/internal/ir/node_gen.go:1797
cmd/compile/internal/ir.(*ReturnStmt).doChildren(0xc000bc7cc0, 0xc000c20270)
    cmd/compile/internal/ir/node_gen.go:1231 +0xbc
cmd/compile/internal/ir.DoChildren(...)
    cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf649a0, 0xc000bc7cc0})
    cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.doNodes(...)
    cmd/compile/internal/ir/node_gen.go:1797
cmd/compile/internal/ir.(*Func).doChildren(0xf64760?, 0xc000c20270)
    cmd/compile/internal/ir/func.go:177 +0x4e
cmd/compile/internal/ir.DoChildren(...)
    cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf64760, 0xc000be1440})
    cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.Visit({0xf64760, 0xc000be1440}, 0xc000c0f080)
    cmd/compile/internal/ir/visit.go:108 +0xb3
cmd/compile/internal/ir.(*bottomUpVisitor).visit(0xc000c10480, 0xc000be1440)
    cmd/compile/internal/ir/scc.go:84 +0x1bf
cmd/compile/internal/ir.VisitFuncsBottomUp({0xc000a76488, 0x43, 0x2?}, 0xe2af68)
    cmd/compile/internal/ir/scc.go:58 +0xb8
cmd/compile/internal/escape.Funcs(...)
    cmd/compile/internal/escape/escape.go:122
cmd/compile/internal/gc.Main(0xe2adb0)
    cmd/compile/internal/gc/main.go:256 +0x1076
main.main()
    cmd/compile/main.go:57 +0xf9

so this appears to be an escape analysis problem, at least initially.

thanm commented 5 months ago

OK, I spent a little while debugging this.

Escape analysis is bombing in this part of initFunc:

// Initialize resultIndex for result parameters.
for i, f := range fn.Type().Results() {
    e.oldLoc(f.Nname.(*ir.Name)).resultIndex = 1 + i
}

because f.Nname is nil as opposed to pointing to the name for the function, which in this case is proto.Nullable[uint64].IsSet.

What's happening here is that at some point during the compilation of the offending package github.com/ClickHouse/clickhouse-go/v2/ext we encounter a reference to this specific generic function, and the code that reads the export data marks it for inclusion/compilation within the current package. The key bit of code here is this:

https://go.googlesource.com/go/+/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/cmd/compile/internal/noder/reader.go#1123

What's supposed to happen at that point is that we'll read in the body for the function, and in fact there is a call to r.addBody later in funcExt, e.g. here.

The problem: at the point of that call, r.curfn is nil, and as a result instead of immediately reading in the body, it gets tacked onto the todoBodies list here.

Ordinarily this would be fine, but: at the point that this happens we're not doing a "regular" read of the export data, we're looking things up as part of PGO devirtualization. Here's the stack trace:

      cmd/compile/internal/noder.(*reader).funcExt /w/go1.22/src/cmd/compile/internal/noder/reader.go:1110
      cmd/compile/internal/noder.(*reader).method /w/go1.22/src/cmd/compile/internal/noder/reader.go:1039
      cmd/compile/internal/noder.(*pkgReader).objIdxMayFail /w/go1.22/src/cmd/compile/internal/noder/reader.go:818
      cmd/compile/internal/noder.(*pkgReader).objIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:672
      cmd/compile/internal/noder.(*pkgReader).objInstIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:664
      cmd/compile/internal/noder.(*reader).obj /w/go1.22/src/cmd/compile/internal/noder/reader.go:636
      ...
      <a couple hundred similar frames elided>
      ...
      cmd/compile/internal/noder.(*reader).doTyp /w/go1.22/src/cmd/compile/internal/noder/reader.go:524
      cmd/compile/internal/noder.(*pkgReader).typIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:431
      cmd/compile/internal/noder.(*reader).typWrapped /w/go1.22/src/cmd/compile/internal/noder/reader.go:390
      cmd/compile/internal/noder.(*pkgReader).objIdxMayFail /w/go1.22/src/cmd/compile/internal/noder/reader.go:807
      cmd/compile/internal/noder.lookupMethod /w/go1.22/src/cmd/compile/internal/noder/unified.go:113
      cmd/compile/internal/noder.LookupFunc /w/go1.22/src/cmd/compile/internal/noder/unified.go:65
      cmd/compile/internal/pgo.addIndirectEdges /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:443
      cmd/compile/internal/pgo.createIRGraph /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:291
      cmd/compile/internal/pgo.New /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:192
      cmd/compile/internal/gc.Main /w/go1.22/src/cmd/compile/internal/gc/main.go:221

So essentially we wind up with some dangling references on todoBodies, and we never actually read the body of the function in question.

Not sure yet what the best overall approach is here. I am also unsure as to why this problem would suddenly go away on tip, since as far as can see there haven't been specific changes there that would ameliorate the problem.

More work needed. CC @prattmic for additional eyeballs relating to PGO stuff.

prattmic commented 5 months ago

@thanm Thanks for taking a look. noder.LookupFunc does not currently handle instantiating generic functions, so that part is unsurprising. However, the expectation is that LookupFunc would simply never find the symbol name that appears in the profile at all because that includes the .go.shape.blah component that doesn't exist in the high-level symbol name.

Since this is a method, we first look up the type, then search for the method. Perhaps one of the other methods on proto.Nullable[uint64] has already been instantiated, so we successfully find the type after all, but IsSet itself isn't instantiated? Just a thought, I'm not too familiar with instantiations.

thanm commented 5 months ago

However, the expectation is that LookupFunc would simply never find the symbol name that appears in the profile at all because that includes the .go.shape.blah component that doesn't exist in the high-level symbol name

Yes, this is certainly true. However what's happening here is that the pgoir stuff is looking for (and finding) the method

github.com/ClickHouse/clickhouse-go/v2/lib/column.(*Map).AppendRow

Nothing wrong there, but the body of AppendRow refers to a struct type that has a data member of generic type (proto.Nullable[uint64]) which has a method proto.Nullable[uint64].IsSet) and it is that generic method that is causing the problem.

The detailed sequence of events is as follows:

  1. cmd/compile/internal/pgo.addIndirectEdges runs and pulls in the IR for column.(*Map).AppendRow since it is a hot indirect target.

  2. the importing process leaves a lot of dangling functions on the todoBodies list (as described above), including 'proto.Nullable[uint64]'. In addition, the imported adds proto.Nullable[uint64] to the needWrapperTypes list.

  3. The inliner runs, but it turns out that the top level package being compiled, github.com/ClickHouse/clickhouse-go/v2/ext is unusual and it has zero inlining opportunities. If the inliner had run, it would have made a call to readBodies, but this never happens.

  4. Wrapper generation runs (after inlining, before escape analysis). Since proto.Nullable[uint64] is on the needWrapperTypes list, we manufacture pointer wrappers for all its methods, including proto.(*Nullable[uint64]).IsSet. This new wrapper has a direct call to proto.Nullable[uint64].IsSet (which still doesn't have a func body).

  5. Now escape analysis runs. It processes the two IsSet methods, at which point it crashes.

So we have to have a lot of things lining up in exactly the right way for this to happen.

I am pretty sure that the right fix here is to add a cleanup call to readBodies from pgo.createIRGraph, but I am having a heck of a time trying to create a minimized test case that will reproduce the problem.

The thing I keep stumbling over in my smaller test case is that the generic type in question gets added to needWrapperTypes (which is what I want) but also at some later point gets added to haveWrapperTypes, which means we don't get the necessary wrapper. Very frustrating. I'll keep hacking away on the minimization.

cuonglm commented 5 months ago

@thanm I double check and the CL that changes to reflect package actually "fix" this issue.

It's very weird.

gopherbot commented 5 months ago

Change https://go.dev/cl/591076 mentions this issue: cmd/go: new test case for pgo-devirt compile problem

gopherbot commented 5 months ago

Change https://go.dev/cl/591075 mentions this issue: cmd/compile/internal: add a PGO devirt post-lookup cleanup hook