bazelbuild / rules_go

Go rules for Bazel
Apache License 2.0
1.36k stars 635 forks source link

cgo: `go link` detects all linker flags as unsupported when using custom cc toolchain #3886

Open mikedanese opened 3 months ago

mikedanese commented 3 months ago

What version of rules_go are you using?

master sync'd to latest

mike.danese@ip-10-110-30-195:~/rules_go$ git show
commit 36e04e9f1e9b56c865332edfb49e1620beaedf2c

What version of Bazel are you using?

mike.danese@ip-10-110-30-195:~/rules_go$ bazel version
Bazelisk version: development
Build label: 7.1.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Mar 11 17:55:31 2024 (1710179731)
Build timestamp: 1710179731
Build timestamp as int: 1710179731
mike.danese@ip-10-110-30-195:~/rules_go$ git show
commit 36e04e9f1e9b56c865332edfb49e1620beaedf2c

Does this issue reproduce with the latest releases of all the above?

What operating system and processor architecture are you using?

Linux

What did you see instead?

I have a custom cc toolchain based on LLVM. This toolchain sets a sysroot link option, e.g. --sysroot=external/sysroot-linux-x86_64-gnu7-llvm-1.

When a cgo library is generated, the cgo tool converts this LD_FLAG value into an absolute path here: https://cs.opensource.google/go/go/+/master:src/cmd/cgo/main.go;l=353-357;drc=bdccd923e914ab61d77a8f23a3329cf1d5aaa7c1. For example, my sysroot ldflag might be converted to /home/mike.danese/.cache/bazel/_bazel_mike.danese/71e455921cc87b7db5489570176e2cff/sandbox/linux-sandbox/400/execroot/universe/external/sysroot-linux-x86_64-gnu7-llvm-16. These LD_FLAGs get backed into the object.

Now when it's time to link, go tool link probes the linker (in linkerFlagSupported) to determine if e.g. -no-pie should be configured: https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/lib.go;l=1833-1834;drc=2ab9218c86ed625362df5060f64fcd59398a76f3. It does this by compiling a trivial program. To construct the compile command, the ldflags from the cgo object are appended to the extdld flags: https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/lib.go;l=2079-2080;drc=2ab9218c86ed625362df5060f64fcd59398a76f3. This results in a command like this:

external/llvm-toolchain-16-0-4-linux-x86_64/bin/clang \
  -m64 -fuse-ld=lld -Wl,--build-id=md5 -Wl,--hash-style=gnu -Wl,-z,relro,-z,now -Wl,-z,separate-code \
  --sysroot=external/sysroot-linux-x86_64-gnu7-llvm-16 -pthread -fuse-ld=lld \
  -Wl,--build-id=md5 -Wl,--hash-style=gnu -Wl,-z,relro,-z,now -Wl,-z,separate-code \
  --sysroot=/home/mike.danese/.cache/bazel/_bazel_mike.danese/71e455921cc87b7db5489570176e2cff/sandbox/linux-sandbox/423/execroot/universe/external/sysroot-linux-x86_64-gnu7-llvm-16 \
  -pthread -o /tmp/go-link-606451175/a.out -no-pie /tmp/go-link-606451175/trivial.c

sysroot arg is passed twice, once from --extld flags and once from the go_ldflags attribute in the cgo object so the second --sysroot wins. This is a problem because the second sysroot references a path in the no longer existant execution sandbox where the cgo object was compiled. Thus, the probe fails with a nonsensical error:

ld.lld: error: cannot open crt1.o: No such file or directory
ld.lld: error: cannot open crti.o: No such file or directory
ld.lld: error: cannot open crtbegin.o: No such file or directory
ld.lld: error: unable to find library -lgcc
ld.lld: error: unable to find library -lgcc_s
ld.lld: error: unable to find library -lpthread
ld.lld: error: unable to find library -lc
ld.lld: error: unable to find library -lgcc
ld.lld: error: unable to find library -lgcc_s
ld.lld: error: cannot open crtend.o: No such file or directory
ld.lld: error: cannot open crtn.o: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)

This error is swallowed silently, and go link assumes that the linker flag is unsupported. In the case of -no-pie, this causes rules_go to attempt to link PIE cgo objects into go binaries without relro (e.g. in normal exe buildmode on linux). This causes hard to debug runtime errors such as:

runtime: pcHeader: magic= 0xfffffff1 pad1= 0 pad2= 0 minLC= 1 ptrSize= 8 pcHeader.textStart= 0x1be100 text= 0x557164240100 pluginpath=
fatal error: invalid function symbol table
runtime: panic before malloc heap initialized

runtime stack:
runtime.throw({0x5571640f265a?, 0x0?})
        GOROOT/src/runtime/panic.go:1077 +0x5c fp=0x7fff932345a8 sp=0x7fff93234578 pc=0x55716427741c
runtime.moduledataverify1(0x8?)
        GOROOT/src/runtime/symtab.go:533 +0x816 fp=0x7fff932346c8 sp=0x7fff932345a8 pc=0x5571642958f6
runtime.moduledataverify(...)
        GOROOT/src/runtime/symtab.go:519
runtime.schedinit()
        GOROOT/src/runtime/proc.go:726 +0x4c fp=0x7fff93234710 sp=0x7fff932346c8 pc=0x55716427af6c
runtime.rt0_go()
        src/runtime/asm_amd64.s:349 +0x11c fp=0x7fff93234718 sp=0x7fff93234710 pc=0x5571642aa17c

There are ~10 other calls to linkerFlagSupported and none of them are correctly detecting flag support.

mikedanese commented 3 months ago

@ianlancetaylor, was the ordering of extldflags before the LDFLAGS from cgo directives in linkerFlagSupported picked intentionally? I can't tell if it's a terrible solution, but swapping the order would give the go link extdflags precedence over cgo directives for mutually exclusive linker options.

func linkerFlagSupported(arch *sys.Arch, linker, altLinker, flag string) bool {
    ...
    moreFlags := trimLinkerArgv(append(flagExtldflags, ldflag...))

From here: https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/lib.go;l=2079;drc=1e433915ce684049a6a44fd506f691f448b56c76

fmeum commented 3 months ago

I can't answer that question, but ideally we wouldn't include any absolute paths into the outputs of compilation. Maybe we could use a symlink instead of converting the sysroot to an absolute path?

That said, if flipping the order of flags is a simple way to fix this and no tests break, I'm happy to accept that change.

ianlancetaylor commented 3 months ago

I don't understand what is creating the absolute path. The code you mention in cmd/cgo only applies to Go file names, not to options like --sysroot=dir.

mikedanese commented 3 months ago

Ah ya, misread that. Thanks.

So this is actually happening in rules_go. The flags are getting absolute here:

https://github.com/bazelbuild/rules_go/blob/aeb83e878033ef357642c87122e193df44da03fe/go/tools/builders/stdlib.go#L161-L165

So we have a --sysroot from a custom cc toolchain. When building the stdlib with go install, we make all CGO_LDFLAGS absolute, presumably because go install changes directories. If we don't, then we get an error like:

_cgo_export.c:3:10: fatal error: 'stdlib.h' file not found
adbmal commented 2 months ago

I encountered the same issue. We use a custom cc toolchain, the binaries built success but cannot run.

runtime: pcHeader: magic= 0xfffffff1 pad1= 0 pad2= 0 minLC= 1 ptrSize= 8 pcHeader.textStart= 0x1be100 text= 0x557164240100 pluginpath=
fatal error: invalid function symbol table
runtime: panic before malloc heap initialized

According @mikedanese , I try to remove the codes of absEnv as mentioned above , but not works, the issue persists. https://github.com/bazelbuild/rules_go/blob/aeb83e878033ef357642c87122e193df44da03fe/go/tools/builders/stdlib.go#L161-L165

I found that when rule_go link the binary, it doesn't fetch clink flags through clinkopts; instead, it reads gc_linkopts and puts the flags after -extldflags into _extract_extldflags. https://github.com/bazelbuild/rules_go/blob/c0ef535977f9fd2d9a67243552cd04da285ab629/go/private/actions/link.bzl#L64-L106

https://github.com/bazelbuild/rules_go/blob/c0ef535977f9fd2d9a67243552cd04da285ab629/go/private/actions/link.bzl#L230-L240

So, after adding the following flags to my go_binary, it started running correctly.

   gc_linkopts = [
        "-extldflags",
        "-nopie",
    ],