golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.26k stars 17.57k forks source link

net: failures with `signal arrived during cgo execution` #60132

Open bcmills opened 1 year ago

bcmills commented 1 year ago
#!watchflakes
post <- goos == "freebsd" && pkg == "net" && `signal arrived during cgo execution`
bcmills commented 1 year ago

One of these was reported in https://github.com/golang/go/issues/27992#issuecomment-1542912326, but doesn't match the prior failure mode for which that issue was created.

I saw another in a TryBot in https://storage.googleapis.com/go-build-log/55480854/freebsd-amd64-12_3_a94bfe2e.log.

mknyszek commented 1 year ago

Yeah, that definitely looks different from #27992, but I'm not sure this is necessarily going to be a C&RT issue? It seems like there's a segfault in C code during a net package test. signal arrived during cgo execution is just indicating that the crash happened in cgo, so the stack is going to be truncated. Tentatively removing the compiler/runtime label, but feel free to add it back if you disagree.

bcmills commented 1 year ago

signal arrived during cgo execution is just indicating that the crash happened in cgo, so the stack is going to be truncated.

Ah, I see. That makes it tricky to track down the actual failure, though — is there some way we can provide the equivalent of a default runtime.SetCgoTraceback for the cgo dependencies in the standard library? (Can we make assumptions about system libraries that allow for something simpler than the full generality of SetCgoTraceback that we need for user C code?)

(CC @ianlancetaylor)

ianlancetaylor commented 1 year ago

If anything, system libraries are harder to get a traceback from, because they are always heavily optimized and because often the debug info is stored somewhere else. (Without the debug info the traceback is close to useless, it's just a list of PC values.)

For debugging purposes a blank import of github.com/ianlancetaylor/cgosymbolizer will often get a C backtrace, but I really can't recommend making that part of the Go standard library. It's 17,000 lines of C code.

gopherbot commented 1 year ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net" && `signal arrived during cgo execution`
2023-05-10 22:43 freebsd-amd64-12_3 go@639957eb net.TestLookupDotsWithRemoteSource (log) SIGSEGV: segmentation violation PC=0x8009f9f1c m=0 sigcode=1 signal arrived during cgo execution rax 0xc2c3 rbx 0x437e rcx 0xc2c3 rdx 0xffffffffffffffff rdi 0x0 rsi 0x0 ... /tmp/workdir/go/src/net/lookup_unix.go:96 +0xa5 fp=0xc00032bd98 sp=0xc00032bd38 pc=0x561325 net.(*Resolver).LookupCNAME(0xc000392310?, {0x6dcc10?, 0x879980?}, {0x685212, 0xb}) /tmp/workdir/go/src/net/lookup.go:472 +0x2b fp=0xc00032bde0 sp=0xc00032bd98 pc=0x55deeb net.LookupCNAME(...) /tmp/workdir/go/src/net/lookup.go:455 net.testDots(0xc0001df380, {0x682ee0, 0x3}) /tmp/workdir/go/src/net/lookup_test.go:676 +0x12e fp=0xc00032bef0 sp=0xc00032bde0 pc=0x5b918e net.TestLookupDotsWithRemoteSource(0xc0001df380) /tmp/workdir/go/src/net/lookup_test.go:658 +0x157 fp=0xc00032bf70 sp=0xc00032bef0 pc=0x5b8fb7 testing.tRunner(0xc0001df380, 0x699c48)
2023-05-22 16:48 freebsd-amd64-13_0 go@10fbd925 net.TestLookupDotsWithRemoteSource (log) SIGSEGV: segmentation violation PC=0x800a115e1 m=0 sigcode=1 signal arrived during cgo execution rax 0x1ffff8 rbx 0x8008621b0 rcx 0xffffffffffffffff rdx 0x8008620c0 rdi 0xffffffffffffffff rsi 0x800862280 ... /tmp/workdir/go/src/net/lookup_unix.go:96 +0xa5 fp=0xc00023bd98 sp=0xc00023bd38 pc=0x563c85 net.(*Resolver).LookupCNAME(0xc000311cb0?, {0x6e3a90?, 0x883bc0?}, {0x68a838, 0xb}) /tmp/workdir/go/src/net/lookup.go:472 +0x2b fp=0xc00023bde0 sp=0xc00023bd98 pc=0x56082b net.LookupCNAME(...) /tmp/workdir/go/src/net/lookup.go:455 net.testDots(0xc00020cea0, {0x6884f3, 0x3}) /tmp/workdir/go/src/net/lookup_test.go:676 +0x12e fp=0xc00023bef0 sp=0xc00023bde0 pc=0x5bd08e net.TestLookupDotsWithRemoteSource(0xc00020cea0) /tmp/workdir/go/src/net/lookup_test.go:658 +0x157 fp=0xc00023bf70 sp=0xc00023bef0 pc=0x5bceb7 testing.tRunner(0xc00020cea0, 0x69f4f0)
2023-05-22 19:05 freebsd-amd64-13_0 go@6761bff4 net.TestLookupDotsWithRemoteSource (log) SIGSEGV: segmentation violation PC=0x800a115e1 m=0 sigcode=1 signal arrived during cgo execution rax 0x1ffff8 rbx 0x8008621b0 rcx 0xffffffffffffffff rdx 0x8008620c0 rdi 0xffffffffffffffff rsi 0x800862280 ... /tmp/workdir/go/src/net/lookup_unix.go:96 +0xa5 fp=0xc0003abd98 sp=0xc0003abd38 pc=0x563c85 net.(*Resolver).LookupCNAME(0xc000358b10?, {0x6e3a90?, 0x883bc0?}, {0x68a838, 0xb}) /tmp/workdir/go/src/net/lookup.go:472 +0x2b fp=0xc0003abde0 sp=0xc0003abd98 pc=0x56082b net.LookupCNAME(...) /tmp/workdir/go/src/net/lookup.go:455 net.testDots(0xc0003689c0, {0x6884f3, 0x3}) /tmp/workdir/go/src/net/lookup_test.go:676 +0x12e fp=0xc0003abef0 sp=0xc0003abde0 pc=0x5bd08e net.TestLookupDotsWithRemoteSource(0xc0003689c0) /tmp/workdir/go/src/net/lookup_test.go:658 +0x157 fp=0xc0003abf70 sp=0xc0003abef0 pc=0x5bceb7 testing.tRunner(0xc0003689c0, 0x69f4f0)
2023-05-22 19:37 freebsd-amd64-13_0 go@8c445b7c net.TestLookupDotsWithRemoteSource (log) SIGSEGV: segmentation violation PC=0x800a115e1 m=0 sigcode=1 signal arrived during cgo execution rax 0x1ffff8 rbx 0x8008621b0 rcx 0xffffffffffffffff rdx 0x8008620c0 rdi 0xffffffffffffffff rsi 0x800862280 ... /tmp/workdir/go/src/net/lookup_unix.go:96 +0xa5 fp=0xc00023dd98 sp=0xc00023dd38 pc=0x563c85 net.(*Resolver).LookupCNAME(0xc000580dd0?, {0x6e3a90?, 0x883bc0?}, {0x68a838, 0xb}) /tmp/workdir/go/src/net/lookup.go:472 +0x2b fp=0xc00023dde0 sp=0xc00023dd98 pc=0x56082b net.LookupCNAME(...) /tmp/workdir/go/src/net/lookup.go:455 net.testDots(0xc0002b4ea0, {0x6884f3, 0x3}) /tmp/workdir/go/src/net/lookup_test.go:676 +0x12e fp=0xc00023def0 sp=0xc00023dde0 pc=0x5bd08e net.TestLookupDotsWithRemoteSource(0xc0002b4ea0) /tmp/workdir/go/src/net/lookup_test.go:658 +0x157 fp=0xc00023df70 sp=0xc00023def0 pc=0x5bceb7 testing.tRunner(0xc0002b4ea0, 0x69f4f0)
2023-05-23 11:36 freebsd-amd64-13_0 go@380529d5 net.TestLookupDotsWithRemoteSource (log) SIGSEGV: segmentation violation PC=0x800a115e1 m=0 sigcode=1 signal arrived during cgo execution rax 0x1ffff8 rbx 0x8008621b0 rcx 0xffffffffffffffff rdx 0x8008620c0 rdi 0xffffffffffffffff rsi 0x800862280 ... /tmp/workdir/go/src/net/lookup_unix.go:96 +0xa5 fp=0xc0003d5d98 sp=0xc0003d5d38 pc=0x563c85 net.(*Resolver).LookupCNAME(0xc0000c87f0?, {0x6e3a90?, 0x883bc0?}, {0x68a838, 0xb}) /tmp/workdir/go/src/net/lookup.go:472 +0x2b fp=0xc0003d5de0 sp=0xc0003d5d98 pc=0x56082b net.LookupCNAME(...) /tmp/workdir/go/src/net/lookup.go:455 net.testDots(0xc000133a00, {0x6884f3, 0x3}) /tmp/workdir/go/src/net/lookup_test.go:676 +0x12e fp=0xc0003d5ef0 sp=0xc0003d5de0 pc=0x5bd08e net.TestLookupDotsWithRemoteSource(0xc000133a00) /tmp/workdir/go/src/net/lookup_test.go:658 +0x157 fp=0xc0003d5f70 sp=0xc0003d5ef0 pc=0x5bceb7 testing.tRunner(0xc000133a00, 0x69f4f0)
2023-05-23 16:36 freebsd-amd64-13_0 go@d9f7efed net.TestLookupDotsWithRemoteSource (log) SIGSEGV: segmentation violation PC=0x800a115e1 m=0 sigcode=1 signal arrived during cgo execution rax 0x1ffff8 rbx 0x8008621b0 rcx 0xffffffffffffffff rdx 0x8008620c0 rdi 0xffffffffffffffff rsi 0x800862280 ... /tmp/workdir/go/src/net/lookup_unix.go:96 +0xa5 fp=0xc0000cfd98 sp=0xc0000cfd38 pc=0x563c85 net.(*Resolver).LookupCNAME(0xc00009c2b0?, {0x6e3a90?, 0x883bc0?}, {0x68a838, 0xb}) /tmp/workdir/go/src/net/lookup.go:472 +0x2b fp=0xc0000cfde0 sp=0xc0000cfd98 pc=0x56082b net.LookupCNAME(...) /tmp/workdir/go/src/net/lookup.go:455 net.testDots(0xc000596820, {0x6884f3, 0x3}) /tmp/workdir/go/src/net/lookup_test.go:676 +0x12e fp=0xc0000cfef0 sp=0xc0000cfde0 pc=0x5bd08e net.TestLookupDotsWithRemoteSource(0xc000596820) /tmp/workdir/go/src/net/lookup_test.go:658 +0x157 fp=0xc0000cff70 sp=0xc0000cfef0 pc=0x5bceb7 testing.tRunner(0xc000596820, 0x69f4f0)
2023-05-23 19:06 freebsd-amd64-13_0 go@ef2bb813 net.TestLookupDotsWithRemoteSource (log) SIGSEGV: segmentation violation PC=0x800a115e1 m=0 sigcode=1 signal arrived during cgo execution rax 0x1ffff8 rbx 0x8008621b0 rcx 0xffffffffffffffff rdx 0x8008620c0 rdi 0xffffffffffffffff rsi 0x800862280 ... /tmp/workdir/go/src/net/lookup_unix.go:96 +0xa5 fp=0xc00022bd98 sp=0xc00022bd38 pc=0x5640c5 net.(*Resolver).LookupCNAME(0xc000388f20?, {0x6e3b70?, 0x883bc0?}, {0x68a838, 0xb}) /tmp/workdir/go/src/net/lookup.go:472 +0x2b fp=0xc00022bde0 sp=0xc00022bd98 pc=0x560c6b net.LookupCNAME(...) /tmp/workdir/go/src/net/lookup.go:455 net.testDots(0xc00039b860, {0x6884f3, 0x3}) /tmp/workdir/go/src/net/lookup_test.go:676 +0x12e fp=0xc00022bef0 sp=0xc00022bde0 pc=0x5bd4ce net.TestLookupDotsWithRemoteSource(0xc00039b860) /tmp/workdir/go/src/net/lookup_test.go:658 +0x157 fp=0xc00022bf70 sp=0xc00022bef0 pc=0x5bd2f7 testing.tRunner(0xc00039b860, 0x69f528)

watchflakes

bcmills commented 1 year ago

Iiiiinteresting, all freebsd-amd64-13_0.

attn @golang/freebsd !

ayang64 commented 1 year ago

Okay -- so I'll see if i can reproduce but running the tests with github.com/ianlancetaylor/cgosymbolizer and if I bump into it, I'll post the trace.

I'm curious: are we running FreeBSD 14 trybots? It might be interesting to know if this was fixed in later releases -- might give me a place to start bisecting.

bcmills commented 1 year ago

are we running FreeBSD 14 trybots?

Appears not. https://cs.opensource.google/go/x/build/+/master:env/freebsd-amd64/make.bash only shows versions up to 13.0-SNAPSHOT. (You're welcome to update the scripts, though — someone on release interrupts should be able to help you deploy the image.)

evanj commented 12 months ago

See issue https://github.com/golang/go/issues/55197 which appears to have the same flakes. Issue https://github.com/golang/go/issues/27992 has flakes for this same test which are different (e.g. no such host, server misbehaving).

gopherbot commented 1 month ago

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "freebsd" && pkg == "net" && `signal arrived during cgo execution`
2024-06-26 22:21 go1.21-freebsd-riscv64 release-branch.go1.21@c9be6ae7 net.TestLookupDotsWithRemoteSource [ABORT] (log) === RUN TestLookupDotsWithRemoteSource SIGSEGV: segmentation violation PC=0x405cacc8 m=4 sigcode=2 signal arrived during cgo execution goroutine 778 [syscall]: runtime.cgocall(0x426df0, 0x8817b790) /usr/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/cgocall.go:157 +0x48 fp=0x8817b768 sp=0x8817b738 pc=0x22b260 net._C2func_res_ninit(0x8ca3d280) _cgo_gotypes.go:222 +0x44 fp=0x8817b788 sp=0x8817b768 pc=0x406e54 ... a3 0xffff a4 0x0 a5 0x409f1180 a6 0x409f1170 a7 0x1 s2 0x8ca3d280 s3 0x409f1160 s4 0xffffffffffffffff s5 0x3ffff s6 0xffffffffc0000000 s7 0x409f1070 s8 0x8817bd08 s9 0x8817bbc0 s10 0x8822fb90 s11 0x409f1040 t3 0x4060efdc t4 0xff00 t5 0xfefefefefefefeff t6 0x409f1020 pc 0x405cacc8
2024-07-02 18:51 go1.21-freebsd-riscv64 release-branch.go1.21@12e9b968 net.TestLookupDotsWithRemoteSource [ABORT] (log) === RUN TestLookupDotsWithRemoteSource SIGSEGV: segmentation violation PC=0x405cacc8 m=3 sigcode=2 signal arrived during cgo execution goroutine 763 [syscall]: runtime.cgocall(0x426df0, 0x882a1790) /usr/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/cgocall.go:157 +0x48 fp=0x882a1768 sp=0x882a1738 pc=0x22b260 net._C2func_res_ninit(0x8c016280) _cgo_gotypes.go:222 +0x44 fp=0x882a1788 sp=0x882a1768 pc=0x406e54 ... a3 0xffff a4 0x0 a5 0x409d9180 a6 0x409d9170 a7 0x1 s2 0x8c016280 s3 0x409d9160 s4 0xffffffffffffffff s5 0x3ffff s6 0xffffffffc0000000 s7 0x409d9070 s8 0x882a1d08 s9 0x882a1b00 s10 0x880de030 s11 0x409d9040 t3 0x4060efdc t4 0xff00 t5 0xfefefefefefefeff t6 0x409d9020 pc 0x405cacc8

watchflakes

enihcam commented 1 month ago

any progress on this? centos is have the same issue.

image

ianlancetaylor commented 1 month ago

@enihcam Please post plain text as plain text, not as an image. Images are much harder to read. Also, please include all the text; your image seems to be missing the first line or two. Thanks.

That said, the issue you are encountering does not seem to be the one that this bug report is about. This issue is about a failure on FreeBSD, and you are using CentOS. The logs in this issue are all about crashes in res_ninit. Yours seems to be a crash in getaddrinfo. So I suggest that you open a new issue.

When you open a new issue: does your problem repeat consistently? Do you have a test case you could share? Thanks.

enihcam commented 1 month ago

@enihcam Please post plain text as plain text, not as an image. Images are much harder to read. Also, please include all the text; your image seems to be missing the first line or two. Thanks.

That said, the issue you are encountering does not seem to be the one that this bug report is about. This issue is about a failure on FreeBSD, and you are using CentOS. The logs in this issue are all about crashes in res_ninit. Yours seems to be a crash in getaddrinfo. So I suggest that you open a new issue.

When you open a new issue: does your problem repeat consistently? Do you have a test case you could share? Thanks.

issue resolved. it was due to glibc incompatible. I replaced it with musl libc.

evanj commented 4 weeks ago

@enihcam based on your description and the traceback, I suspect you may be running in to https://github.com/golang/go/issues/63567 . Any chance your program is calling os.Setenv() ? That specific crash won't happen with musl since its DNS resolver does not use environment variables.

enihcam commented 4 weeks ago

@enihcam based on your description and the traceback, I suspect you may be running in to #63567 . Any chance your program is calling os.Setenv() ? That specific crash won't happen with musl since its DNS resolver does not use environment variables.

yes, you are right. the program compiled with an old-version glibc (with corresponding old-version libnss) crashes while running in an OS with newer version of glibc+libnss, because glibc loads libnss dynamically. musl has no such issues because musl uses its built-in function for resolving domain names, just like netdns=go.