Open gopherbot opened 11 months ago
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "runtime" && test == "TestSUID"
The iOS failure is probably different from the Dragonfly failure. It's timing out on buildTestProg
, and with only 26s of run time.
The Dragonfly issue seems possibly real. CC @golang/dragonfly
Any local test program I can run to verify the issue?
The same symptom as the dragonfly failure above appeared on a 1.20 release branch test, on go1.20-openbsd-amd64
:
=== RUN TestSUID
crash_test.go:138: running go build -o /home/swarming/.swarming/w/ir/x/t/go-build88073405/testsuid.exe
panic: test timed out after 10m0s
running tests:
TestSUID (7m13s)
...
goroutine 23727 [chan receive, 7 minutes]:
runtime.gopark(0x0?, 0xc00006c800?, 0x90?, 0xaa?, 0x474eeb?)
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/proc.go:381 +0xd6 fp=0xc0002daa60 sp=0xc0002daa40 pc=0x43e416
runtime.chanrecv(0xc000982420, 0xc0002dab80, 0x1)
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/chan.go:583 +0x49d fp=0xc0002daaf0 sp=0xc0002daa60 pc=0x40945d
runtime.chanrecv1(0x18?, 0xc001604ea0?)
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/chan.go:442 +0x18 fp=0xc0002dab18 sp=0xc0002daaf0 pc=0x408f58
os/exec.(*Cmd).awaitGoroutines(0xc000088000, 0x0)
/home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:941 +0x233 fp=0xc0002dabf0 sp=0xc0002dab18 pc=0x539fb3
os/exec.(*Cmd).Wait(0xc000088000)
/home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:908 +0x175 fp=0xc0002dac58 sp=0xc0002dabf0 pc=0x539c35
os/exec.(*Cmd).Run(0x3?)
/home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:590 +0x39 fp=0xc0002dac78 sp=0xc0002dac58 pc=0x538619
os/exec.(*Cmd).CombinedOutput(0xc000088000)
/home/swarming/.swarming/w/ir/x/w/goroot/src/os/exec/exec.go:1005 +0xa8 fp=0xc0002daca0 sp=0xc0002dac78 pc=0x53a468
runtime_test.privesc({0x8163bd, 0x5}, {0xc0002dade8, 0x2, 0x2})
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/security_test.go:33 +0x1dd fp=0xc0002dad78 sp=0xc0002daca0 pc=0x74501d
runtime_test.setSetuid(0xc000fc2000, {0x8169a5, 0x6}, {0xc0000282d0, 0x41})
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/security_test.go:49 +0xbc fp=0xc0002dae18 sp=0xc0002dad78 pc=0x74517c
runtime_test.TestSUID(0xc000fc2000)
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/security_test.go:93 +0x1c5 fp=0xc0002daf70 sp=0xc0002dae18 pc=0x745585
testing.tRunner(0xc000fc2000, 0x840b30)
/home/swarming/.swarming/w/ir/x/w/goroot/src/testing/testing.go:1576 +0x10b fp=0xc0002dafc0 sp=0xc0002daf70 pc=0x4ffeeb
testing.(*T).Run.func1()
/home/swarming/.swarming/w/ir/x/w/goroot/src/testing/testing.go:1629 +0x2a fp=0xc0002dafe0 sp=0xc0002dafc0 pc=0x500f2a
runtime.goexit()
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0002dafe8 sp=0xc0002dafe0 pc=0x478321
created by testing.(*T).Run
/home/swarming/.swarming/w/ir/x/w/goroot/src/testing/testing.go:1629 +0x3ea
cc @golang/openbsd
@tuxillo In theory go test -run=TestSUID runtime
, but the failure seems quite rare, so I'm not sure it will reproduce.
@prattmic, that failure mode looks disturbingly similar to the darwin
hangs tracked in #63937. Maybe we're looking at a general kqueue
poller bug?
dragonfly
, openbsd
, and darwin
all use the same implementation:
https://cs.opensource.google/go/go/+/master:src/runtime/netpoll_kqueue.go;l=5;drc=1f3f851a6e965a867979a74f7ebefd03381505c0
(CC @panjf2000)
Unfortunately, I'm afraid that this issue and #63937 share the same root cause: uncanny blocking pipes in kqueue
, TestSUID
and test flakes in #63937 were suspiciously blocking at:
https://github.com/golang/go/blob/8db131082d08e497fd8e9383d0ff7715e1bef478/src/os/exec/exec.go#L577, that would be executed in a goroutine which might be gopark
ed, and was expected to be waked by kevent
in runtime.netpoll
later, but it never did, I can now assume that in hindsight.
Issue created automatically to collect these failures.
Example (log):
— watchflakes