Closed Whissi closed 4 years ago
CC @mknyszek
Interestingly enough, this isn't a failure in the function being tested, but instead in actually generating the data structure for the test. As a result, I suspect this is benign (i.e. there likely isn't an actual non-test bug here), but I'm surprised that this isn't showing up on the trybots and in CI. I tried to reproduce locally and I was unable to at tip or at 1.15.3. Though, I don't have access to 386 hardware, just amd64 and GOARCH=386
... I'm pretty sure the trybots and CI also just do this too. Are you running this on real 386 hardware?
Failing everywhere: On virtualized x86 systems (Host is x86_64) and real x86 hardware (just finished testing on Pentium 4 Prescott which is 32bit only and Atom N550 which supports 64bit but is running x86 Linux).
There's only three tests where the layout for the bitmap to be tested is not constant, and they're the hugepage tests.
Can you please confirm that it's the hugepage tests that are failing by running the following command and sharing the output? (where "go" is in your development GOROOT at bin/go)
go test -v -run="TestPallocDataFindScavengeCandidate" runtime
Also, what's the hugepage size used for THP on these platforms?
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
That should help me track down what exactly the issue is and I can fix it.
/tmp/goroot/src # ./make.bash
Building Go cmd/dist using /usr/lib/go. (go1.14.7 linux/386)
Building Go toolchain1 using /usr/lib/go.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/386.
---
Installed Go for linux/386 in /tmp/goroot
Installed commands in /tmp/goroot/bin
vm-gentoo-x86 /tmp/goroot/src # ./all.bash
Building Go cmd/dist using /usr/lib/go. (go1.14.7 linux/386)
Building Go toolchain1 using /usr/lib/go.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for linux/386.
##### Testing packages.
^C
gentoo-x86 /tmp/goroot/src # cd ..
gentoo-x86 /tmp/goroot # find . -name 'go'
./misc/cgo/testcshared/testdata/go2c2go/go
./pkg/linux_386/cmd/go
./pkg/linux_386/cmd/vendor/golang.org/x/tools/go
./pkg/linux_386/go
./bin/go
./src/cmd/go
./src/cmd/vendor/golang.org/x/tools/go
./src/go
gentoo-x86 /tmp/goroot # ./bin/go version
go version devel +5647d01ab7 Mon Oct 19 19:51:19 2020 +0000 linux/386
gentoo-x86 /tmp/goroot # ./bin/go test -v -run="TestPallocDataFindScavengeCandidate" runtime
=== RUN TestPallocDataFindScavengeCandidate
=== RUN TestPallocDataFindScavengeCandidate/NoneFreeMin1
=== RUN TestPallocDataFindScavengeCandidate/BottomEdge64WithFullMin8
=== RUN TestPallocDataFindScavengeCandidate/NoneFreeMin16
=== RUN TestPallocDataFindScavengeCandidate/AllFreeMin32
=== RUN TestPallocDataFindScavengeCandidate/StartFreeMin4
=== RUN TestPallocDataFindScavengeCandidate/Straddle64Min8
=== RUN TestPallocDataFindScavengeCandidate/PreserveHugePageBottom
fatal error: index out of range
goroutine 24 [running]:
runtime.throw(0x825f734, 0x12)
/tmp/goroot/src/runtime/panic.go:1112 +0x6a fp=0x943a6e0 sp=0x943a6cc pc=0x807c6ca
runtime.panicCheck1(0x80725ee, 0x825f734, 0x12)
/tmp/goroot/src/runtime/panic.go:34 +0xb7 fp=0x943a6f4 sp=0x943a6e0 pc=0x8079be7
runtime.goPanicIndexU(0x8, 0x8)
/tmp/goroot/src/runtime/panic.go:91 +0x37 fp=0x943a718 sp=0x943a6f4 pc=0x8079d47
runtime.(*pageBits).setRange(0x950c380, 0x202, 0xfffffffe)
/tmp/goroot/src/runtime/mpallocbits.go:31 +0x2ce fp=0x943a74c sp=0x943a718 pc=0x80725ee
runtime.(*pallocBits).allocRange(...)
/tmp/goroot/src/runtime/mpallocbits.go:340
runtime.(*pallocData).allocRange(0x950c380, 0x202, 0xfffffffe)
/tmp/goroot/src/runtime/mpallocbits.go:418 +0x33 fp=0x943a75c sp=0x943a74c pc=0x8074143
runtime.(*PallocData).AllocRange(...)
/tmp/goroot/src/runtime/export_test.go:688
runtime_test.makePallocData(0x950e858, 0x1, 0x1, 0x0, 0x0, 0x0, 0x83bd4a0)
/tmp/goroot/src/runtime/mgcscavenge_test.go:24 +0x61 fp=0x943a774 sp=0x943a75c pc=0x81b4121
runtime_test.TestPallocDataFindScavengeCandidate.func1(0x95027e0)
/tmp/goroot/src/runtime/mgcscavenge_test.go:263 +0x4b fp=0x943a7a8 sp=0x943a774 pc=0x81ee9db
testing.tRunner(0x95027e0, 0x9530120)
/tmp/goroot/src/testing/testing.go:1173 +0xb7 fp=0x943a7e8 sp=0x943a7a8 pc=0x811c647
runtime.goexit()
/tmp/goroot/src/runtime/asm_386.s:1333 +0x1 fp=0x943a7ec sp=0x943a7e8 pc=0x80b1651
created by testing.(*T).Run
/tmp/goroot/src/testing/testing.go:1218 +0x217
goroutine 1 [chan receive, locked to thread]:
runtime.gopark(0x826fc74, 0x9420330, 0x170e, 0x2)
/tmp/goroot/src/runtime/proc.go:331 +0xe4 fp=0x944bd48 sp=0x944bd34 pc=0x807ef34
runtime.chanrecv(0x9420300, 0x944bdc7, 0x9400001, 0x811c8e7)
/tmp/goroot/src/runtime/chan.go:581 +0x28a fp=0x944bd90 sp=0x944bd48 pc=0x804e83a
runtime.chanrecv1(0x9420300, 0x944bdc7)
/tmp/goroot/src/runtime/chan.go:443 +0x1c fp=0x944bda4 sp=0x944bd90 pc=0x804e57c
testing.(*T).Run(0x9401500, 0x8268417, 0x23, 0x8270934, 0x401)
/tmp/goroot/src/testing/testing.go:1219 +0x236 fp=0x944bdec sp=0x944bda4 pc=0x811c906
testing.runTests.func1(0x9401420)
/tmp/goroot/src/testing/testing.go:1491 +0x5a fp=0x944be10 sp=0x944bdec pc=0x812071a
testing.tRunner(0x9401420, 0x944bea0)
/tmp/goroot/src/testing/testing.go:1173 +0xb7 fp=0x944be50 sp=0x944be10 pc=0x811c647
testing.runTests(0x940e070, 0x83bb940, 0x157, 0x157, 0xc66c808d, 0xbfdb9c2c, 0xb2d0cce9, 0x8b, 0x83bd4a0, 0x0)
/tmp/goroot/src/testing/testing.go:1489 +0x25a fp=0x944beb4 sp=0x944be50 pc=0x811dc0a
testing.(*M).Run(0x949a000, 0x0)
/tmp/goroot/src/testing/testing.go:1397 +0x178 fp=0x944bf4c sp=0x944beb4 pc=0x811cdc8
runtime_test.TestMain(0x949a000)
/tmp/goroot/src/runtime/crash_test.go:28 +0x21 fp=0x944bf70 sp=0x944bf4c pc=0x818b161
main.main()
_testmain.go:1221 +0x110 fp=0x944bfc8 sp=0x944bf70 pc=0x81ff4f0
runtime.main()
/tmp/goroot/src/runtime/proc.go:220 +0x232 fp=0x944bff0 sp=0x944bfc8 pc=0x807eb72
runtime.goexit()
/tmp/goroot/src/runtime/asm_386.s:1333 +0x1 fp=0x944bff4 sp=0x944bff0 pc=0x80b1651
goroutine 2 [force gc (idle)]:
runtime.gopark(0x826fde8, 0x83bd020, 0x1411, 0x1)
/tmp/goroot/src/runtime/proc.go:331 +0xe4 fp=0x9438fdc sp=0x9438fc8 pc=0x807ef34
runtime.goparkunlock(...)
/tmp/goroot/src/runtime/proc.go:337
runtime.forcegchelper()
/tmp/goroot/src/runtime/proc.go:271 +0xbf fp=0x9438ff0 sp=0x9438fdc pc=0x807eddf
runtime.goexit()
/tmp/goroot/src/runtime/asm_386.s:1333 +0x1 fp=0x9438ff4 sp=0x9438ff0 pc=0x80b1651
created by runtime.init.5
/tmp/goroot/src/runtime/proc.go:259 +0x2b
goroutine 3 [GC sweep wait]:
runtime.gopark(0x826fde8, 0x83bd340, 0x140c, 0x1)
/tmp/goroot/src/runtime/proc.go:331 +0xe4 fp=0x94397d4 sp=0x94397c0 pc=0x807ef34
runtime.goparkunlock(...)
/tmp/goroot/src/runtime/proc.go:337
runtime.bgsweep(0x9420040)
/tmp/goroot/src/runtime/mgcsweep.go:161 +0x8f fp=0x94397e8 sp=0x94397d4 pc=0x806ab2f
runtime.goexit()
/tmp/goroot/src/runtime/asm_386.s:1333 +0x1 fp=0x94397ec sp=0x94397e8 pc=0x80b1651
created by runtime.gcenable
/tmp/goroot/src/runtime/mgc.go:217 +0x4d
goroutine 4 [GC scavenge wait]:
runtime.gopark(0x826fde8, 0x83bd2c0, 0x140d, 0x1)
/tmp/goroot/src/runtime/proc.go:331 +0xe4 fp=0x9439fac sp=0x9439f98 pc=0x807ef34
runtime.goparkunlock(...)
/tmp/goroot/src/runtime/proc.go:337
runtime.bgscavenge(0x9420040)
/tmp/goroot/src/runtime/mgcscavenge.go:265 +0xbe fp=0x9439fe8 sp=0x9439fac pc=0x8068d7e
runtime.goexit()
/tmp/goroot/src/runtime/asm_386.s:1333 +0x1 fp=0x9439fec sp=0x9439fe8 pc=0x80b1651
created by runtime.gcenable
/tmp/goroot/src/runtime/mgc.go:218 +0x6b
goroutine 5 [finalizer wait]:
runtime.gopark(0x826fde8, 0x83cf260, 0x8041410, 0x1)
/tmp/goroot/src/runtime/proc.go:331 +0xe4 fp=0x94387a8 sp=0x9438794 pc=0x807ef34
runtime.goparkunlock(...)
/tmp/goroot/src/runtime/proc.go:337
runtime.runfinq()
/tmp/goroot/src/runtime/mfinal.go:175 +0x94 fp=0x94387f0 sp=0x94387a8 pc=0x805fab4
runtime.goexit()
/tmp/goroot/src/runtime/asm_386.s:1333 +0x1 fp=0x94387f4 sp=0x94387f0 pc=0x80b1651
created by runtime.createfing
/tmp/goroot/src/runtime/mfinal.go:156 +0x60
goroutine 6 [chan receive]:
runtime.gopark(0x826fc74, 0x95484b0, 0x170e, 0x2)
/tmp/goroot/src/runtime/proc.go:331 +0xe4 fp=0x9446c58 sp=0x9446c44 pc=0x807ef34
runtime.chanrecv(0x9548480, 0x9446cd7, 0x9401501, 0x811c8e7)
/tmp/goroot/src/runtime/chan.go:581 +0x28a fp=0x9446ca0 sp=0x9446c58 pc=0x804e83a
runtime.chanrecv1(0x9548480, 0x9446cd7)
/tmp/goroot/src/runtime/chan.go:443 +0x1c fp=0x9446cb4 sp=0x9446ca0 pc=0x804e57c
testing.(*T).Run(0x95027e0, 0x8261b5e, 0x16, 0x9530120, 0x9524601)
/tmp/goroot/src/testing/testing.go:1219 +0x236 fp=0x9446cfc sp=0x9446cb4 pc=0x811c906
runtime_test.TestPallocDataFindScavengeCandidate(0x9401500)
/tmp/goroot/src/runtime/mgcscavenge_test.go:262 +0x144b fp=0x9446fa8 sp=0x9446cfc pc=0x81b59db
testing.tRunner(0x9401500, 0x8270934)
/tmp/goroot/src/testing/testing.go:1173 +0xb7 fp=0x9446fe8 sp=0x9446fa8 pc=0x811c647
runtime.goexit()
/tmp/goroot/src/runtime/asm_386.s:1333 +0x1 fp=0x9446fec sp=0x9446fe8 pc=0x80b1651
created by testing.(*T).Run
/tmp/goroot/src/testing/testing.go:1218 +0x217
FAIL runtime 0.011s
FAIL
gentoo-x86 /tmp/goroot # cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
4194304
Great, thank you so much for the quick turnaround. That'll help a lot.
Change https://golang.org/cl/263837 mentions this issue: runtime: fix scavenging tests for pallocChunkBytes huge pages and larger
Sorry for the delayed response, the problem was fairly easy to find. The tests were assuming that a huge page wouldn't be exactly pallocChunkPages
in size which is true in your case (4 MiB huge pages). The code already deals with this case, so I just added a test case for that.
Would you mind trying out https://golang.org/cl/263837 to ensure that the new test passes? I can stub the value in and show that it works, but I'd like to make sure the problem is solved on your end.
ALL TESTS PASSED
/tmp/goroot/bin/go version
go version devel +f015b07c24 Tue Oct 20 13:09:22 2020 +0000 linux/386
Thank you very much!
Hi,should this test failure be considered serious enough that it prevents go 1.14.x and 1.15.x from being sent to Gentoo's stable users on x86? If so, is there any chance of backporting the fix to these versions?
Thanks much,
William
It only affects folks working on Go itself (or a fork, I suppose) on one of these platforms. The actual code that a user would run works fine as there's no real bug here outside of tests. On the other hand, it's a very safe change because it only modifies test code so I'm certainly not opposed to backporting personally. I'm honestly not sure how the cost/benefit situation of another change in the point releases works out.
Let's ask @dmitshur maybe?
Thank you for clarification, for Gentoo Linux, we will move on now that we know that this is just a test failure.
@mknyszek You're right that test-only fixes are generally safer to backport, which makes it an easier decision. That said, while okay in small numbers, we don't want to do it too frequently. We do it occasionally when there is a specific need, usually in order to fix false-positive failures during pre-release testing and trybots that run on cherry-pick CLs. If there's a good reason to backport a test-only fix, we will certainly consider it. You can do a search for test-only packports we've approved with this query.
In this case, it seems like the most important aspect here was to understand that this issue was in the test itself, and not a problem in the real code. So as I understand there isn't a strong need to backport the test fix anymore, especially since it doesn't affect testing done by our current builders. But if I'm missing something, please request a backport and provide a rationale.
Also CC @golang/release FYI.
I was trying to build go-1.14.9 and go-1.14.10 on Gentoo, but tests failed. So I tried 1.15.3 and git master -- the same. See the failure below.
This is a x86 system.
Does this issue reproduce with the latest release?
Yes, I tried building from git master (
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I am trying to run test suite.
What did you expect to see?
No test failures.
What did you see instead?
Test failures
Output