golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.26k stars 17.57k forks source link

runtime: timeouts in TestSUID #63735

Open gopherbot opened 11 months ago

gopherbot commented 11 months ago
#!watchflakes
post <- pkg == "runtime" && test == "TestSUID" && `panic: test timed out`

Issue created automatically to collect these failures.

Example (log):

panic: test timed out after 3m0s
running tests:
    TestSUID (2m40s)

runtime.gopark(0xc00006fad0?, 0x4740ee?, 0xa0?, 0x9b?, 0xc00006fb60?)
    /tmp/workdir/go/src/runtime/proc.go:400 +0xce fp=0xc00006fa90 sp=0xc00006fa70 pc=0x43f82e
runtime.chanrecv(0xc0009806c0, 0xc00006fb88, 0x1)
    /tmp/workdir/go/src/runtime/chan.go:583 +0x3bf fp=0xc00006fb08 sp=0xc00006fa90 pc=0x4097df
runtime.chanrecv1(0xc00080d520?, 0x409c40?)
    /tmp/workdir/go/src/runtime/chan.go:442 +0x12 fp=0xc00006fb30 sp=0xc00006fb08 pc=0x4093f2
os/exec.(*Cmd).awaitGoroutines(0xc000204840, 0x0)
    /tmp/workdir/go/src/os/exec/exec.go:948 +0x1fa fp=0xc00006fc00 sp=0xc00006fb30 pc=0x53f4da
os/exec.(*Cmd).Wait(0xc000204840)
    /tmp/workdir/go/src/os/exec/exec.go:915 +0x16c fp=0xc00006fc60 sp=0xc00006fc00 pc=0x53f1ac
os/exec.(*Cmd).Run(0xc000204840)
    /tmp/workdir/go/src/os/exec/exec.go:607 +0x2d fp=0xc00006fc78 sp=0xc00006fc60 pc=0x53dced
os/exec.(*Cmd).CombinedOutput(0xc000204840)
    /tmp/workdir/go/src/os/exec/exec.go:1012 +0x85 fp=0xc00006fca0 sp=0xc00006fc78 pc=0x53f945
runtime_test.privesc({0x85bf21, 0x5}, {0xc00006fde8?, 0x2, 0x2})
    /tmp/workdir/go/src/runtime/security_test.go:35 +0x1cb fp=0xc00006fd78 sp=0xc00006fca0 pc=0x7792eb
runtime_test.setSetuid(0xc0009a6000, {0x85c615, 0x6}, {0xc0004c81e0, 0x30})
    /tmp/workdir/go/src/runtime/security_test.go:55 +0x1ac fp=0xc00006fe18 sp=0xc00006fd78 pc=0x77952c
runtime_test.TestSUID(0xc0009a6000)
    /tmp/workdir/go/src/runtime/security_test.go:95 +0x1b3 fp=0xc00006ff70 sp=0xc00006fe18 pc=0x779813
testing.tRunner(0xc0009a6000, 0x888a98)

watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestSUID"
2023-10-16 03:29 dragonfly-amd64-622 go@5873bd1d runtime.TestSUID (log) panic: test timed out after 3m0s running tests: TestSUID (2m40s) runtime.gopark(0xc00006fad0?, 0x4740ee?, 0xa0?, 0x9b?, 0xc00006fb60?) /tmp/workdir/go/src/runtime/proc.go:400 +0xce fp=0xc00006fa90 sp=0xc00006fa70 pc=0x43f82e runtime.chanrecv(0xc0009806c0, 0xc00006fb88, 0x1) /tmp/workdir/go/src/runtime/chan.go:583 +0x3bf fp=0xc00006fb08 sp=0xc00006fa90 pc=0x4097df runtime.chanrecv1(0xc00080d520?, 0x409c40?) /tmp/workdir/go/src/runtime/chan.go:442 +0x12 fp=0xc00006fb30 sp=0xc00006fb08 pc=0x4093f2 os/exec.(*Cmd).awaitGoroutines(0xc000204840, 0x0) /tmp/workdir/go/src/os/exec/exec.go:948 +0x1fa fp=0xc00006fc00 sp=0xc00006fb30 pc=0x53f4da os/exec.(*Cmd).Wait(0xc000204840) /tmp/workdir/go/src/os/exec/exec.go:915 +0x16c fp=0xc00006fc60 sp=0xc00006fc00 pc=0x53f1ac os/exec.(*Cmd).Run(0xc000204840) /tmp/workdir/go/src/os/exec/exec.go:607 +0x2d fp=0xc00006fc78 sp=0xc00006fc60 pc=0x53dced os/exec.(*Cmd).CombinedOutput(0xc000204840) /tmp/workdir/go/src/os/exec/exec.go:1012 +0x85 fp=0xc00006fca0 sp=0xc00006fc78 pc=0x53f945 runtime_test.privesc({0x85bf21, 0x5}, {0xc00006fde8?, 0x2, 0x2}) /tmp/workdir/go/src/runtime/security_test.go:35 +0x1cb fp=0xc00006fd78 sp=0xc00006fca0 pc=0x7792eb runtime_test.setSetuid(0xc0009a6000, {0x85c615, 0x6}, {0xc0004c81e0, 0x30}) /tmp/workdir/go/src/runtime/security_test.go:55 +0x1ac fp=0xc00006fe18 sp=0xc00006fd78 pc=0x77952c runtime_test.TestSUID(0xc0009a6000) /tmp/workdir/go/src/runtime/security_test.go:95 +0x1b3 fp=0xc00006ff70 sp=0xc00006fe18 pc=0x779813 testing.tRunner(0xc0009a6000, 0x888a98)
2023-10-23 22:56 ios-arm64-corellium go@f09db2bb runtime.TestSUID (log) panic: test timed out after 3m0s running tests: TestSUID (26s) syscall.syscall6(0x4?, 0x10544f9b0?, 0x102866a40?, 0x10213089c?, 0x10544f9c8?, 0x10001000109fc?, 0x104c25f40?) /tmp/workdir-host-ios-arm64-corellium-ios/go/src/runtime/sys_darwin.go:45 +0x68 fp=0x10544f8e0 sp=0x10544f820 pc=0x102197b48 syscall.wait4(0x10544f978?, 0x1021c9398?, 0x90?, 0x1025d05c0?) /tmp/workdir-host-ios-arm64-corellium-ios/go/src/syscall/zsyscall_darwin_arm64.go:43 +0x4c fp=0x10544f940 sp=0x10544f8e0 pc=0x1021a85dc syscall.Wait4(0x130528060?, 0x10544f9b4, 0x13174a020?, 0x102130544?) /tmp/workdir-host-ios-arm64-corellium-ios/go/src/syscall/syscall_bsd.go:144 +0x28 fp=0x10544f980 sp=0x10544f940 pc=0x1021a6d48 ... /tmp/workdir-host-ios-arm64-corellium-ios/go/src/runtime/crash_test.go:154 +0x2e8 fp=0x10544fc90 sp=0x10544faa0 pc=0x1023e23f8 sync.(*Once).doSlow(0x10256e500?, 0x1300bacc0?) /tmp/workdir-host-ios-arm64-corellium-ios/go/src/sync/once.go:74 +0x100 fp=0x10544fcf0 sp=0x10544fc90 pc=0x1021a3020 sync.(*Once).Do(...) /tmp/workdir-host-ios-arm64-corellium-ios/go/src/sync/once.go:65 runtime_test.buildTestProg(0x10552c000, {0x10247f914, 0x8}, {0x0?, 0x0, 0x0}) /tmp/workdir-host-ios-arm64-corellium-ios/go/src/runtime/crash_test.go:139 +0x48c fp=0x10544fe00 sp=0x10544fcf0 pc=0x1023e20ac runtime_test.TestSUID(0x10552c000) /tmp/workdir-host-ios-arm64-corellium-ios/go/src/runtime/security_test.go:82 +0x9c fp=0x10544ff60 sp=0x10544fe00 pc=0x10245411c testing.tRunner(0x10552c000, 0x1025df7f0)

watchflakes

mknyszek commented 11 months ago

The iOS failure is probably different from the Dragonfly failure. It's timing out on buildTestProg, and with only 26s of run time.

The Dragonfly issue seems possibly real. CC @golang/dragonfly

tuxillo commented 11 months ago

Any local test program I can run to verify the issue?

prattmic commented 9 months ago

The same symptom as the dragonfly failure above appeared on a 1.20 release branch test, on go1.20-openbsd-amd64:

=== RUN   TestSUID
    crash_test.go:138: running go build -o /home/swarming/.swarming/w/ir/x/t/go-build88073405/testsuid.exe 
panic: test timed out after 10m0s
running tests:
    TestSUID (7m13s)

...

goroutine 23727 [chan receive, 7 minutes]:
runtime.gopark(0x0?, 0xc00006c800?, 0x90?, 0xaa?, 0x474eeb?)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/proc.go:381 +0xd6 fp=0xc0002daa60 sp=0xc0002daa40 pc=0x43e416
runtime.chanrecv(0xc000982420, 0xc0002dab80, 0x1)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/chan.go:583 +0x49d fp=0xc0002daaf0 sp=0xc0002daa60 pc=0x40945d
runtime.chanrecv1(0x18?, 0xc001604ea0?)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/chan.go:442 +0x18 fp=0xc0002dab18 sp=0xc0002daaf0 pc=0x408f58
os/exec.(*Cmd).awaitGoroutines(0xc000088000, 0x0)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:941 +0x233 fp=0xc0002dabf0 sp=0xc0002dab18 pc=0x539fb3
os/exec.(*Cmd).Wait(0xc000088000)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:908 +0x175 fp=0xc0002dac58 sp=0xc0002dabf0 pc=0x539c35
os/exec.(*Cmd).Run(0x3?)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:590 +0x39 fp=0xc0002dac78 sp=0xc0002dac58 pc=0x538619
os/exec.(*Cmd).CombinedOutput(0xc000088000)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:1005 +0xa8 fp=0xc0002daca0 sp=0xc0002dac78 pc=0x53a468
runtime_test.privesc({0x8163bd, 0x5}, {0xc0002dade8, 0x2, 0x2})
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/security_test.go:33 +0x1dd fp=0xc0002dad78 sp=0xc0002daca0 pc=0x74501d
runtime_test.setSetuid(0xc000fc2000, {0x8169a5, 0x6}, {0xc0000282d0, 0x41})
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/security_test.go:49 +0xbc fp=0xc0002dae18 sp=0xc0002dad78 pc=0x74517c
runtime_test.TestSUID(0xc000fc2000)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/security_test.go:93 +0x1c5 fp=0xc0002daf70 sp=0xc0002dae18 pc=0x745585
testing.tRunner(0xc000fc2000, 0x840b30)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/testing/testing.go:1576 +0x10b fp=0xc0002dafc0 sp=0xc0002daf70 pc=0x4ffeeb
testing.(*T).Run.func1()
    /home/swarming/.swarming/w/ir/x/w/goroot/src/testing/testing.go:1629 +0x2a fp=0xc0002dafe0 sp=0xc0002dafc0 pc=0x500f2a
runtime.goexit()
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0002dafe8 sp=0xc0002dafe0 pc=0x478321
created by testing.(*T).Run
    /home/swarming/.swarming/w/ir/x/w/goroot/src/testing/testing.go:1629 +0x3ea

cc @golang/openbsd

prattmic commented 9 months ago

@tuxillo In theory go test -run=TestSUID runtime, but the failure seems quite rare, so I'm not sure it will reproduce.

bcmills commented 9 months ago

@prattmic, that failure mode looks disturbingly similar to the darwin hangs tracked in #63937. Maybe we're looking at a general kqueue poller bug?

dragonfly, openbsd, and darwin all use the same implementation: https://cs.opensource.google/go/go/+/master:src/runtime/netpoll_kqueue.go;l=5;drc=1f3f851a6e965a867979a74f7ebefd03381505c0

(CC @panjf2000)

panjf2000 commented 9 months ago

Unfortunately, I'm afraid that this issue and #63937 share the same root cause: uncanny blocking pipes in kqueue, TestSUID and test flakes in #63937 were suspiciously blocking at: https://github.com/golang/go/blob/8db131082d08e497fd8e9383d0ff7715e1bef478/src/os/exec/exec.go#L577, that would be executed in a goroutine which might be goparked, and was expected to be waked by kevent in runtime.netpoll later, but it never did, I can now assume that in hindsight.