Open bcmills opened 2 years ago
Change https://golang.org/cl/380495 mentions this issue: cmd/fiximports: skip TestDryRun on plan9-arm
This test runs in about 500ms on my Linux workstation, and it's timing out after 10 minutes on
plan9-arm
I've been running TestDryRun in a loop on the least powerful of the current plan9-arm builders (Pi 3B+ with 1GB RAM). It takes about 21 seconds each time. Running the whole fiximports.test is about 145 seconds. So whatever's causing the timeout is not very repeatable. Needs investigation.
I can't replicate this by running the fiximports test on its own - it never takes more than 25 seconds. The problem must be an interaction with other tests being run in parallel.
I've just tried running go test x/tools/cmd/...
and found that fiximports.test in that context took 425 seconds. Most of the time it was running in parallel with guru.test (487420K virtual memory size) and stringer.test (a more normal 276804K virtual memory). Physical memory was tight but not swapping. The finish message for guru.test never appeared, and after a while it became apparent that all subcommands had terminated but the go test
command was deadlocked. I sent it a kill signal to get a stack trace, which looks like this:
goroutine 0 [idle]:
runtime.plan9_semacquire()
/sys/lib/go1.17/src/runtime/sys_plan9_arm.s:120 +0x8
runtime.semasleep(0xffffffffffffffff)
/sys/lib/go1.17/src/runtime/os_plan9.go:484 +0x68
runtime.notesleep(0x112ac0b4)
/sys/lib/go1.17/src/runtime/lock_sema.go:182 +0x88
runtime.mPark()
/sys/lib/go1.17/src/runtime/proc.go:1441 +0x20
runtime.stopm()
/sys/lib/go1.17/src/runtime/proc.go:2408 +0x78
runtime.findrunnable()
/sys/lib/go1.17/src/runtime/proc.go:2984 +0x9e4
runtime.schedule()
/sys/lib/go1.17/src/runtime/proc.go:3367 +0x2ac
runtime.park_m(0x10c00780)
/sys/lib/go1.17/src/runtime/proc.go:3516 +0x18c
runtime.mcall()
/sys/lib/go1.17/src/runtime/asm_arm.s:266 +0x48
goroutine 1 [semacquire, 15 minutes]:
sync.runtime_Semacquire(0x111a8f48)
/sys/lib/go1.17/src/runtime/sema.go:56 +0x34
sync.(*WaitGroup).Wait(0x111a8f40)
/sys/lib/go1.17/src/sync/waitgroup.go:130 +0x94
cmd/go/internal/work.(*Builder).Do(0x10dacae0, {0x5dd1bc, 0x10c28394}, 0x111a2630)
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:208 +0x3c4
cmd/go/internal/test.runTest({0x5dd1bc, 0x10c28394}, 0x829068, {0x10c20070, 0x1, 0x1})
/sys/lib/go1.17/src/cmd/go/internal/test/test.go:828 +0x2384
main.invoke(0x829068, {0x10c20068, 0x2, 0x2})
/sys/lib/go1.17/src/cmd/go/main.go:216 +0x2f0
main.main()
/sys/lib/go1.17/src/cmd/go/main.go:173 +0x914
goroutine 2894 [select, 7 minutes]:
cmd/go/internal/work.(*Builder).Do.func3(0x110f4a48, 0x111a8f40, 0x10dacae0, 0x111f86d0)
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:189 +0xf4
created by cmd/go/internal/work.(*Builder).Do
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:185 +0x3a4
goroutine 3281 [chan receive, 15 minutes]:
cmd/go/internal/base.processSignals.func1(0x1130ea80)
/sys/lib/go1.17/src/cmd/go/internal/base/signal.go:21 +0x24
created by cmd/go/internal/base.processSignals
/sys/lib/go1.17/src/cmd/go/internal/base/signal.go:20 +0x74
goroutine 3264 [syscall, 15 minutes]:
os/signal.signal_recv()
/sys/lib/go1.17/src/runtime/sigqueue_plan9.go:116 +0x74
os/signal.loop()
/sys/lib/go1.17/src/os/signal/signal_plan9.go:27 +0x14
created by os/signal.Notify.func1.1
/sys/lib/go1.17/src/os/signal/signal.go:151 +0x30
goroutine 2892 [runnable]:
syscall.Syscall(0x2a, 0x10ecb080, 0x13a48000, 0x73)
/sys/lib/go1.17/src/syscall/asm_plan9_arm.s:16 +0x8
syscall.stat({0x10ecb020, 0x21}, {0x13a48000, 0x73, 0x73})
/sys/lib/go1.17/src/syscall/zsyscall_plan9_arm.go:115 +0xfc
syscall.Stat({0x10ecb020, 0x21}, {0x13a48000, 0x73, 0x73})
/sys/lib/go1.17/src/syscall/syscall_plan9.go:400 +0x74
os.dirstat({0x4b4ef8, 0x10ded804})
/sys/lib/go1.17/src/os/stat_plan9.go:62 +0x140
os.statNolog({0x10ecb020, 0x21})
/sys/lib/go1.17/src/os/stat_plan9.go:95 +0x34
os.lstatNolog(...)
/sys/lib/go1.17/src/os/stat_plan9.go:104
os.Lstat({0x10ecb020, 0x21})
/sys/lib/go1.17/src/os/stat.go:22 +0x38
path/filepath.walkSymlinks({0x10c30380, 0x30})
/sys/lib/go1.17/src/path/filepath/symlink.go:84 +0x168
path/filepath.evalSymlinks(...)
/sys/lib/go1.17/src/path/filepath/symlink_unix.go:7
path/filepath.EvalSymlinks({0x10c30380, 0x30})
/sys/lib/go1.17/src/path/filepath/path.go:235 +0x24
cmd/go/internal/search.InDir({0x11096feb, 0x4}, {0x10c30380, 0x30})
/sys/lib/go1.17/src/cmd/go/internal/search/search.go:587 +0x120
cmd/go/internal/test.computeTestInputsID(0x10c938c0, {0x11a68000, 0xd71589, 0xd7158a})
/sys/lib/go1.17/src/cmd/go/internal/test/test.go:1519 +0x630
cmd/go/internal/test.(*runCache).saveOutput(0x111c8730, 0x10c938c0)
/sys/lib/go1.17/src/cmd/go/internal/test/test.go:1642 +0x2f0
cmd/go/internal/test.(*runCache).builderRunTest(0x111c8730, 0x10dacae0, {0x5dd1bc, 0x10c28394}, 0x10c938c0)
/sys/lib/go1.17/src/cmd/go/internal/test/test.go:1282 +0x1334
cmd/go/internal/work.(*Builder).Do.func2({0x5dd1bc, 0x10c28394}, 0x10c938c0)
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:137 +0x4fc
cmd/go/internal/work.(*Builder).Do.func3(0x110f4a48, 0x111a8f40, 0x10dacae0, 0x111f86d0)
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:199 +0x9c
created by cmd/go/internal/work.(*Builder).Do
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:185 +0x3a4
goroutine 2893 [select, 7 minutes]:
cmd/go/internal/work.(*Builder).Do.func3(0x110f4a48, 0x111a8f40, 0x10dacae0, 0x111f86d0)
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:189 +0xf4
created by cmd/go/internal/work.(*Builder).Do
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:185 +0x3a4
goroutine 2974 [select, 7 minutes]:
cmd/go/internal/work.(*Builder).Do.func3(0x110f4a48, 0x111a8f40, 0x10dacae0, 0x111f86d0)
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:189 +0xf4
created by cmd/go/internal/work.(*Builder).Do
/sys/lib/go1.17/src/cmd/go/internal/work/exec.go:185 +0x3a4
The symptom may be a timeout in fiximports, but I think the underlying problem is a deadlock somewhere else.
A bit more information.
In the above test, the reason guru.test was taking so long was that I hadn't used the -short
flag. And I think go test
wasn't deadlocked: it was just busy attempting to cache the test inputs and results. Computing the TestInputsID
takes a stupendously long time because it's going through a testlog.txt
which lists 357000 filenames, each requiring a stat
call after being expanded by filepath.EvalSymlinks
(which itself requires a stat
call per filename component). In fact there are only 5841 unique lines in the testlog.txt
, so sorting the list and eliminating duplicates would speed that up quite a bit. Also on Plan 9 (are there any other OSs without symlinks?) implementing EvalSymlinks
as a simple call to filepath.Clean
instead of pointlessly calling stat
on each component would also be helpful.
Unfortunately that doesn't get any closer to explaining why fiximports.test is timing out on the builder.
In fact there are only 5841 unique lines in the testlog.txt, so sorting the list and eliminating duplicates would speed that up quite a bit.
Ooh, that's a very useful observation for #26562!
Also on Plan 9 (are there any other OSs without symlinks?) implementing
EvalSymlinks
as a simple call tofilepath.Clean
instead of pointlessly callingstat
on each component would also be helpful.
To my knowledge there are no other OSs supported by Go that lack symlinks entirely. (Windows has supported them since Vista, and in Windows 10 they don't even require elevated privileges as of build 14972.)
If you'd like to send a CL for plan9, I'd be happy to review it when the tree opens for Go 1.19. I think it should be as simple as adding src/path/filepath/symlink_plan9.go
and updating the build constraint for symlink_unix.go
:
https://cs.opensource.google/go/go/+/master:src/path/filepath/symlink_unix.go;l=1;drc=f229e7031a6efb2f23241b5da000c3b3203081d6
With TestDryRun
skipped, we're still hitting frequent timeouts for this package:
greplogs --dashboard -md -l -e 'panic: test timed out.*\n(?:.*\n)*golang\.org/x/tools/cmd/fiximports\.Test' --since=2022-01-25
By eyeball, ~all of the recent timeouts appear to be in TestFixImports
.
Change https://go.dev/cl/398817 mentions this issue: cmd/fiximports: skip TestFixImports on plan9-arm
Change https://go.dev/cl/404954 mentions this issue: path/filepath: simplify EvalSymlinks for plan9
Also various occurrences on dragonfly/amd64:
Edit: These are likely the same as https://github.com/golang/go/issues/54278#issuecomment-1206562901 and not relevant to this issue.
greplogs --dashboard -md -l -e 'panic: test timed out.*\n(?:.*\n)*golang\.org/x/tools/cmd/fiximports\.TestDryRun' --since=2021-01-01
I assume that this has something to do with filesystem performance, but the difference is several orders of magnitude. This test runs in about 500ms on my Linux workstation, and it's timing out after 10 minutes on
plan9-arm
. 😅(CC @millerresearch)