golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.43k stars 17.59k forks source link

cmd/link: TestDWARF*/testprog is flaky on darwin/amd64 #32218

Open josharian opened 5 years ago

josharian commented 5 years ago

At b84e0bc61a, on darwin/amd64:

$ go test -run=TestDWARF/testprog -count=10 cmd/link
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (2.03s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.50s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.78s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.52s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.96s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.83s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.53s)
    --- FAIL: TestDWARFiOS/testprog#03 (1.98s)
        dwarf_test.go:96: exit status 255
--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprog#02 (1.99s)
        dwarf_test.go:96: exit status 255
FAIL
FAIL    cmd/link    47.577s
FAIL

I've also seen

--- FAIL: TestDWARF (0.22s)
    --- FAIL: TestDWARF/testprog (3.00s)
        dwarf_test.go:96: exit status 255
FAIL

The failures are intermittent.

cc @dr2chase

cherrymui commented 5 years ago

Happened for me as well, with exactly same error. I commented on CL https://go-review.googlesource.com/c/go/+/170377

odeke-em commented 5 years ago

Running on c11f6c4929

go version devel +c11f6c4929 Fri Jun 21 21:24:47 2019 +0000 darwin/amd64

uname

uname -a
Darwin Emmanuels-MacBook-Pro-2.local 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64

On the first try I get a failure but this time spectacularly with a 1.2MB log whose beginning is something like

--- FAIL: TestDWARFiOS (0.54s)
    --- FAIL: TestDWARFiOS/testprogcgo#01 (2.33s)
        dwarf_test.go:96: symbols testprogcgo.exe: exit status 255: /var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [arm64, 0.004386 seconds]:
                A829102B-EE24-329C-874B-9BAAE2337C79 /private/var/folders/v3/7z434qpx5v3bw0wh8h2myfpw0000gn/T/go-link-TestDWARF414534277/testprogcgo.exe [AOUT, PIE, FaultedFromDisk]
                    0x0000000000000000 (0x100000000) __PAGEZERO SEGMENT
                    0x0000000100000000 ( 0xbc000) __TEXT SEGMENT
                        0x0000000100000000 (  0x4c90) MACH_HEADER
                        0x0000000100004c90 ( 0xb67a4) __TEXT __text
                            0x0000000100004c90 (    0x68) runtime.text [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004cf8 (    0x40) internal/cpu.Initialize [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100004d38 (   0x620) internal/cpu.processOptions [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005358 (    0x50) internal/cpu.indexByte [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x00000001000053a8 (   0x680) internal/cpu.doinit [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a28 (    0x10) type..hash.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a38 (    0x10) type..eq.internal/cpu.CacheLinePad [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a48 (    0x50) type..hash.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005a98 (    0x60) type..eq.internal/cpu.arm64 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005af8 (    0x70) type..hash.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005b68 (    0xa0) type..eq.internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c08 (    0x50) type..hash.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005c58 (    0x50) type..eq.internal/cpu.x86 [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
                            0x0000000100005ca8 (    0x70) type..hash.[24]internal/cpu.option [FUNC, NameNList, MangledNameNList, Merged, NList, FunctionStarts]
....
<FRAMES_INTENTIONALLY_ELIDED_BY_MYSELF_SINCE_IT_IS_A_LARGE_FILE>
                            0x00000001001e3d20 (    0x10) syscall.envOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d30 (    0x10) time.localOnce [NameNList, MangledNameNList, NList]
                            0x00000001001e3d40 (    0x10) runtime.writeBarrier [NameNList, MangledNameNList, NList]
                            0x00000001001e3d50 (    0x20) syscall.ForkLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3d70 (    0x30) syscall.envLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3da0 (    0x20) runtime.hashkey [NameNList, MangledNameNList, NList]
                            0x00000001001e3dc0 (    0x40) runtime.newmHandoff [NameNList, MangledNameNList, NList]
                            0x00000001001e3e00 (    0x40) runtime.sig [NameNList, MangledNameNList, NList]
                            0x00000001001e3e40 (    0x40) runtime.execLock [NameNList, MangledNameNList, NList]
                            0x00000001001e3e80 (    0x40) runtime.debug [NameNList, MangledNameNList, NList]
                            0x00000001001e3ec0 (    0x40) runtime.finptrmask [NameNList, MangledNameNList, NList]
                            0x00000001001e3f00 (    0x80) runtime.aeskeysched [NameNList, MangledNameNList, NList]
                            0x00000001001e3f80 (    0x80) runtime.handlingSig [NameNList, MangledNameNList, NList]
                            0x00000001001e4000 (    0xa0) internal/cpu.X86 [NameNList, MangledNameNList, NList]
                            0x00000001001e40a0 (    0xa0) internal/cpu.ARM64 [NameNList, MangledNameNList, NList]
                            0x00000001001e4140 (    0xa0) runtime.gcController [NameNList, MangledNameNList, NList]
                            0x00000001001e41e0 (   0x100) runtime.fwdSig [NameNList, MangledNameNList, NList]
                            0x00000001001e42e0 (   0x200) runtime.printBacklog [NameNList, MangledNameNList, NList]
                            0x00000001001e44e0 (   0x400) runtime.zeroVal [NameNList, MangledNameNList, NList]
                            0x00000001001e48e0 (  0x1758) runtime.memstats [NameNList, MangledNameNList, NList]
                    0x00000001001e8000 ( 0x7c000) __LINKEDIT SEGMENT
            symbols[65885]: [fatal]  child process status could not be determined; exit code unavailable.
FAIL
exit status 1
FAIL    cmd/link        47.568s

with the output at https://gist.github.com/odeke-em/85bb51ded80d9239a97c435f875b04c5

aclements commented 5 years ago

Some notes:

As far as I can tell, this has never happened on the dashboard based on greplogs -E 'dwarf_test.*(exit status 255|symbols)' -dashboard.

CL 171819 added better logging for this (previously we didn't print the symbols output). That's why the original report just has "exit status 255", but @odeke-em's report has much more information.

Unfortunately, it seems like the symbols tool may just be flaky. Just searching around for that error reveals other people seeing it occasionally in various macOS development tools outside the context of Go.

Could someone with a Mac try just running symbols (or maybe the test) in a loop to see how often it flakes?

dr2chase commented 5 years ago

It may require Darwin 18.5, which is one back from current. I've tried running it in a loop with luck at seeing the bug, will try some more. (Maybe a busy Mac helps?)

dr2chase commented 5 years ago

About 2000 repetitions on Darwin 18.6 (aka Mojave, macOS 10.14.5), with no failure, on a busy-ish machine. Still running.

ianlancetaylor commented 5 years ago

In the output above the symbols program completes, but then reports an error symbols[65885]: [fatal] child process status could not be determined; exit code unavailable. I can see the error string if I run strings on /usr/bin/symbols, so it's definitely coming from there. I can't find any reports of this error on the Interwebs (the top search result for me is in fact this issue). I don't know why the symbols program would start a child process at all. I don't know why it would be unable to determine the status of that child process. I can't find the source code of the symbols program; I don't know if it is available at all.

Moving to 1.14 but if we can't figure out a reproducer, or get the source code, I doubt there is much that we can do.

ianlancetaylor commented 4 years ago

Is this still happening for anyone?

cherrymui commented 4 years ago

I haven't seen this for quite a while, since upgraded to macOS 10.14. I only saw it on macOS 10.13 in the past. My guess is that there may be some problem with the symbols command in earlier version.

9nut commented 4 years ago

FYI, this might be an issue with xcode requiring an update. Here's why I think that:

When I tried to build 1.13.6 this morning I got the following:

--- FAIL: TestDWARF (0.19s)
    --- FAIL: TestDWARF/testprog (1.17s)
        dwarf_test.go:96: symbols testprog.exe: exit status 255: /var/folders/j0/pr17cml548j19x25h2h0dv9h0000gp/T/go-link-TestDWARF355434644/testprog.exe [x86_64, 0.001705 seconds]:
                null-uuid                            /private/var/folders/j0/pr17cml548j19x25h2h0dv9h0000gp/T/go-link-TestDWARF355434644/testprog.exe [AOUT, FaultedFromDisk]  
....

I updated my machine to macOS Catalina (10.15.2) last week, and had not started xcode in a while. I guessed I might need to update xcode. After starting xcode -- which required an update -- Go 1.13.6 was able to build without any issues.

gopherbot commented 3 years ago

Change https://golang.org/cl/297469 mentions this issue: cmd/link: disable flaky Darwin "symbols" test