runtime/trace: TestTraceCPUProfile failures

gopherbot commented 2 years ago

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

Issue created automatically to collect these failures.

Example (log):

SIGQUIT: quit
PC=0x86d24 m=7 sigcode=0

trap    0x0
error   0x0
oldmask 0x0
r0      0x2fbadc
r1      0x80
r2      0x0
r3      0x0
...
    /workdir/go/src/runtime/proc.go:1090 +0x98 fp=0x183acfc sp=0x183acdc pc=0x53e88
runtime.stopTheWorldGC({0x1c6ecd, 0xc})
    /workdir/go/src/runtime/proc.go:1137 +0x48 fp=0x183ad10 sp=0x183acfc pc=0x54054
runtime.StopTrace()
    /workdir/go/src/runtime/trace.go:320 +0x24 fp=0x183ad28 sp=0x183ad10 pc=0x74ab8
runtime/trace.Stop()
    /workdir/go/src/runtime/trace/trace.go:148 +0x78 fp=0x183ad44 sp=0x183ad28 pc=0xe697c
runtime/trace_test.TestTraceCPUProfile(0x1c04e10)
    /workdir/go/src/runtime/trace/trace_test.go:621 +0x1a4 fp=0x183af98 sp=0x183ad44 pc=0x1885ec
testing.tRunner(0x1c04e10, 0x1d3994)

— watchflakes

gopherbot commented 2 years ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/trace" && test == "TestTraceCPUProfile"

2022-08-18 13:16 linux-arm-aws go@38edd9bd runtime/trace.TestTraceCPUProfile (log)

SIGQUIT: quit PC=0x86d24 m=7 sigcode=0 trap 0x0 error 0x0 oldmask 0x0 r0 0x2fbadc r1 0x80 r2 0x0 r3 0x0 ... /workdir/go/src/runtime/proc.go:1090 +0x98 fp=0x183acfc sp=0x183acdc pc=0x53e88 runtime.stopTheWorldGC({0x1c6ecd, 0xc}) /workdir/go/src/runtime/proc.go:1137 +0x48 fp=0x183ad10 sp=0x183acfc pc=0x54054 runtime.StopTrace() /workdir/go/src/runtime/trace.go:320 +0x24 fp=0x183ad28 sp=0x183ad10 pc=0x74ab8 runtime/trace.Stop() /workdir/go/src/runtime/trace/trace.go:148 +0x78 fp=0x183ad44 sp=0x183ad28 pc=0xe697c runtime/trace_test.TestTraceCPUProfile(0x1c04e10) /workdir/go/src/runtime/trace/trace_test.go:621 +0x1a4 fp=0x183af98 sp=0x183ad44 pc=0x1885ec testing.tRunner(0x1c04e10, 0x1d3994)

2022-09-02 19:09 windows-arm64-10 go@55ca6a20 runtime/trace.TestTraceCPUProfile (log)

fatal error: unexpected signal during runtime execution [signal 0xc0000005 code=0x0 addr=0x0 pc=0x7ff6f49eb8e4] runtime stack: runtime.throw({0x7ff6f4b4c65b?, 0x0?}) C:/workdir/go/src/runtime/panic.go:1047 +0x40 fp=0xea793ff1b0 sp=0xea793ff180 pc=0x7ff6f49c73a0 runtime.sigpanic() C:/workdir/go/src/runtime/signal_windows.go:251 +0x220 fp=0xea793ff200 sp=0xea793ff1b0 pc=0x7ff6f49dcc90 runtime.gentraceback(0x4000041520?, 0x4000024000?, 0xea793ff5a8?, 0x40003c4000, 0x0, 0xea793ff5a8, 0x40, 0x0, 0x0?, 0x6) C:/workdir/go/src/runtime/traceback.go:162 +0x304 fp=0xea793ff540 sp=0xea793ff210 pc=0x7ff6f49eb8e4 ... C:/workdir/go/src/runtime/trace/trace_test.go:774 +0x74 fp=0x40002759a0 sp=0x4000275970 pc=0x7ff6f4af3934 runtime/trace_test.cpuHog1(0x0?) C:/workdir/go/src/runtime/trace/trace_test.go:766 +0x24 fp=0x40002759c0 sp=0x40002759a0 pc=0x7ff6f4af3894 runtime/trace_test.cpuHogger(0x7ff6f4b51038, 0x7ff6f4cdfec8, 0x5f5e100) C:/workdir/go/src/runtime/trace/trace_test.go:753 +0x54 fp=0x4000275a10 sp=0x40002759c0 pc=0x7ff6f4af37e4 runtime/trace_test.TestTraceCPUProfile.func1(0x5f5e100) C:/workdir/go/src/runtime/trace/trace_test.go:618 +0x1f8 fp=0x4000275af0 sp=0x4000275a10 pc=0x7ff6f4af3688 runtime/trace_test.TestTraceCPUProfile(0x400040d380) C:/workdir/go/src/runtime/trace/trace_test.go:619 +0x12c fp=0x4000275f60 sp=0x4000275af0 pc=0x7ff6f4af283c testing.tRunner(0x400040d380, 0x7ff6f4b50fc0)

— watchflakes

rhysh commented 2 years ago

As a first pass:

The failure on linux-arm-aws is a timeout, and looks like a conflict between these two goroutines:

goroutine 32 [waiting]:
runtime.systemstack_switch()
    /workdir/go/src/runtime/asm_arm.s:274 +0x4 fp=0x183acdc sp=0x183acd8 pc=0x84088
runtime.stopTheWorld({0x1c6ecd, 0xc})
    /workdir/go/src/runtime/proc.go:1090 +0x98 fp=0x183acfc sp=0x183acdc pc=0x53e88
runtime.stopTheWorldGC({0x1c6ecd, 0xc})
    /workdir/go/src/runtime/proc.go:1137 +0x48 fp=0x183ad10 sp=0x183acfc pc=0x54054
runtime.StopTrace()
    /workdir/go/src/runtime/trace.go:320 +0x24 fp=0x183ad28 sp=0x183ad10 pc=0x74ab8
runtime/trace.Stop()
    /workdir/go/src/runtime/trace/trace.go:148 +0x78 fp=0x183ad44 sp=0x183ad28 pc=0xe697c
runtime/trace_test.TestTraceCPUProfile(0x1c04e10)
    /workdir/go/src/runtime/trace/trace_test.go:621 +0x1a4 fp=0x183af98 sp=0x183ad44 pc=0x1885ec
testing.tRunner(0x1c04e10, 0x1d3994)
    /workdir/go/src/testing/testing.go:1446 +0x118 fp=0x183afe0 sp=0x183af98 pc=0xf3654
testing.(*T).Run.func1()
    /workdir/go/src/testing/testing.go:1493 +0x30 fp=0x183afec sp=0x183afe0 pc=0xf44ac
runtime.goexit()
    /workdir/go/src/runtime/asm_arm.s:824 +0x4 fp=0x183afec sp=0x183afec pc=0x85d14
created by testing.(*T).Run
    /workdir/go/src/testing/testing.go:1493 +0x3a0

goroutine 82 [running]:
    goroutine running on other thread; stack unavailable
created by runtime/trace.Start
    /workdir/go/src/runtime/trace/trace.go:128 +0xd4

The goroutine created on src/runtime/trace/trace.go:128 is the one that calls runtime.ReadTrace: https://github.com/golang/go/blob/38edd9bd8da9d7fc7beeba5fd4fd9d605457b04e/src/runtime/trace/trace.go#L128-L136

The failure on windows-arm64-10 is a crash, fatal error: unexpected signal during runtime execution / [signal 0xc0000005 code=0x0 addr=0x0 pc=0x7ff6f49eb8e4], coming from runtime.gentraceback as called by runtime.sigprof: https://github.com/golang/go/blob/55ca6a20afb718972f0625ce971081f78efe10bf/src/runtime/traceback.go#L162

gopherbot commented 1 year ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/trace" && test == "TestTraceCPUProfile"

2023-10-14 22:30 linux-arm-aws go@bc9dc8d4 runtime/trace.TestTraceCPUProfile (log)

SIGQUIT: quit PC=0x8eefc m=0 sigcode=0 trap 0x0 error 0x0 oldmask 0x0 r0 0x329170 r1 0x80 r2 0x0 r3 0x0 ... /workdir/go/src/runtime/trace/trace_test.go:755 +0x40 fp=0x1c3bc88 sp=0x1c3bc4c pc=0x1a5538 runtime/trace_test.TestTraceCPUProfile.func1.1({0x22a338, 0x1c520d8}) /workdir/go/src/runtime/trace/trace_test.go:615 +0x34 fp=0x1c3bc9c sp=0x1c3bc88 pc=0x1a5444 runtime/pprof.Do({0x22a2f8, 0x339860}, {{0x1f74f50, 0x1, 0x1}}, 0x1c3bd10) /workdir/go/src/runtime/pprof/runtime.go:51 +0xb8 fp=0x1c3bcd8 sp=0x1c3bc9c pc=0x14e21c runtime/trace_test.TestTraceCPUProfile.func1(0x5f5e100) /workdir/go/src/runtime/trace/trace_test.go:614 +0x21c fp=0x1c3bd44 sp=0x1c3bcd8 pc=0x1a5380 runtime/trace_test.TestTraceCPUProfile(0x1eae0f0) /workdir/go/src/runtime/trace/trace_test.go:621 +0x190 fp=0x1c3bf94 sp=0x1c3bd44 pc=0x1a4260 testing.tRunner(0x1eae0f0, 0x1f9968)

— watchflakes

gopherbot commented 1 year ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime/trace" && test == "TestTraceCPUProfile"

2023-10-30 17:08 linux-arm-aws go@ecda959b runtime/trace.TestTraceCPUProfile (log)

SIGQUIT: quit PC=0x8ef88 m=2 sigcode=0 trap 0x0 error 0x0 oldmask 0x0 r0 0x3294f0 r1 0x80 r2 0x2 r3 0x0 ... /workdir/go/src/runtime/trace/trace_test.go:776 +0x60 fp=0x23b9c8c sp=0x23b9c7c pc=0x1a5bdc runtime/trace_test.cpuHog1(0x0) /workdir/go/src/runtime/trace/trace_test.go:768 +0x24 fp=0x23b9c9c sp=0x23b9c8c pc=0x1a5b5c runtime/trace_test.cpuHogger(0x1f9bbc, 0x339990, 0x5f5e100) /workdir/go/src/runtime/trace/trace_test.go:755 +0x40 fp=0x23b9cd8 sp=0x23b9c9c pc=0x1a5aa4 runtime/trace_test.TestTraceCPUProfile.func1(0x5f5e100) /workdir/go/src/runtime/trace/trace_test.go:620 +0x240 fp=0x23b9d44 sp=0x23b9cd8 pc=0x1a5910 runtime/trace_test.TestTraceCPUProfile(0x2542c00) /workdir/go/src/runtime/trace/trace_test.go:621 +0x190 fp=0x23b9f94 sp=0x23b9d44 pc=0x1a47cc testing.tRunner(0x2542c00, 0x1f9af0)

— watchflakes

gopherbot commented 11 months ago

Sorry, but there were parse errors in the watch flakes script. The script I found was:

#!watchflakes
post <- pkg == ("runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`)

And the problems were:

script:2.16: == requires quoted string

See https://go.dev/wiki/Watchflakes for details.

— watchflakes

gopherbot commented 11 months ago

Sorry, but there were parse errors in the watch flakes script. The script I found was:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`)

And the problems were:

script:2.107: unexpected )

See https://go.dev/wiki/Watchflakes for details.

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-10 15:50 windows-arm64-11 go@9d914015 TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.64s) --- FAIL: TestTraceCPUProfile/Default (2.62s) reader_test.go:65: unexpected error while reading the trace: expected a proc but didn't have one trace_test.go:567: found bad trace; dumping to test log... trace_test.go:568: Trace Go1.22 EventBatch gen=1 m=2144 time=23479457578 size=65458 ProcStart dt=1 p=2 p_seq=2 ProcStop dt=1 ProcStart dt=1 p=2 p_seq=3 ProcStop dt=1 ... GoStart dt=1 g=1 g_seq=3938 GoStop dt=1 reason_string=20 stack=14 GoStart dt=1 g=1 g_seq=3939 GoStop dt=1 reason_string=20 stack=14 GoStart dt=1 g=1 g_seq=3940 GoStop dt=1 reason_string=20 stack=14 GoStart dt=1 g=1 g_seq=3941 GoStop dt=1 reason_st ... log truncated ...

2023-11-10 15:51 windows-arm64-11 go@3b303fa9 TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.70s) --- FAIL: TestTraceCPUProfile/Default (2.68s) reader_test.go:65: unexpected error while reading the trace: per-M streams are out-of-order trace_test.go:567: found bad trace; dumping to test log... trace_test.go:568: Trace Go1.22 EventBatch gen=1 m=9936 time=23496448275 size=65455 ProcStart dt=1 p=0 p_seq=3 ProcStop dt=1 ProcStart dt=1 p=0 p_seq=4 ProcStop dt=1 ... GoStop dt=1 reason_string=20 stack=16 GoStart dt=1 g=1 g_seq=19041 GoStop dt=1 reason_string=20 stack=16 GoStart dt=1 g=1 g_seq=19042 GoStop dt=1 reason_string=20 stack=16 GoStart dt=1 g=1 g_seq=19043 GoStop dt=1 reason_string=20 stack=16 ... log truncated ...

2023-11-10 16:18 windows-arm64-11 go@abf84221 TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.63s) --- FAIL: TestTraceCPUProfile/Default (2.62s) reader_test.go:65: unexpected error while reading the trace: per-M streams are out-of-order trace_test.go:569: found bad trace; dumping to test log... trace_test.go:570: Trace Go1.22 EventBatch gen=1 m=1420 time=27049777477 size=65457 ProcStart dt=1 p=0 p_seq=1 ProcStop dt=1 ProcStart dt=1 p=0 p_seq=2 ProcStop dt=1 ... ProcStop dt=1 ProcStart dt=1 p=1 p_seq=2698 ProcStop dt=1 ProcStart dt=1 p=1 p_seq=2699 ProcStop dt=1 ProcStart dt=1 p=1 p_seq=2700 ProcStop dt=1 ProcStart dt=1 p= ... log truncated ...

2023-11-13 22:12 freebsd-arm64-dmgk go@773039ed TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.69s) --- FAIL: TestTraceCPUProfile/Default (4.65s) trace_test.go:177: execution trace did not include all CPU profile samples; 29 in profile, 21 in trace trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 18 in profile, 12 in trace trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 4 in profile, 3 in trace trace_test.go:195: CPU profile included 1 samples at stack "runtime/internal/atomic.(*Uint32).CompareAndSwap:236|runtime.casgstatus:1135|runtime.execute:3051|runtime.schedule:3862|runtime.goschedImpl:4009|runtime.gosched_m:4014|runtime.mcall:192" not present in execution trace trace_test.go:202: execution trace CPU samples: trace_test.go:204: 1: "runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:3995|runtime.gosched_m:4014|runtime.mcall:192" trace_test.go:204: 1: "runtime.duffzero:8|runtime.traceStack:40|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:3995|runtime.gosched_m:4014|runtime.mcall:192" trace_test.go:204: 1: "runtime/internal/atomic.(*Uint32).Load:194|runtime.(*traceSchedResourceState).statusWasTraced:191|runtime.traceLocker.eventWriter:113|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:3995|runtime.gosched_m:4014|runtime.mcall:192" ... GoStop dt=89 reason_string=20 stack=13 GoStart dt=47 g=1 g_seq=13463 GoStop dt=87 reason_string=20 stack=13 GoStart dt=48 g=1 g_seq=13464 GoStop dt=87 reason_string=20 stack=13 GoStart dt=45 g=1 g_seq=13465 GoStop dt=88 reason_string=20 stack=13 GoStart dt=48 g= ... log truncated ...

2023-11-14 16:45 linux-386-longtest go@e14b96cb internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.64s) --- FAIL: TestTraceCPUProfile/Stress (0.72s) trace_test.go:550: stderr: panic: runtime.plainError("send on closed channel") fatal error: panic on system stack runtime stack: runtime.throw({0x8155fbb, 0x15}) /workdir/go/src/runtime/panic.go:1016 +0x4d fp=0xffed8918 sp=0xffed8904 pc=0x807d26d panic({0x81405c0, 0x817cf80}) /workdir/go/src/runtime/panic.go:727 +0x30c fp=0xffed896c sp=0xffed8918 pc=0x807ca8c ... runtime.findRunnable() /workdir/go/src/runtime/proc.go:3198 +0x3e0 fp=0xffed8bc8 sp=0xffed8b1c pc=0x80853a0 runtime.schedule() /workdir/go/src/runtime/proc.go:3812 +0xbd fp=0xffed8be4 sp=0xffed8bc8 pc=0x808709d runtime.goschedImpl(0xa408120, 0x0) /workdir/go/src/runtime/proc.go:4009 +0x14d fp=0xffed8bfc sp=0xffed8be4 pc=0x808794d runtime.gosched_m(0xa408120) /workdir/go/src/runtime/proc.go:4014 +0x26 fp=0xffed8c08 sp=0xffed8bfc pc=0x80879a6 runtime.mcall(0x225220) /workdir/go/src/runtime/asm_386.s:329 +0x44 fp=0xffed8c10 sp=0xffed8c08 pc=0x80af9f4

2023-11-15 20:08 freebsd-arm64-dmgk go@2380862c TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.27s) --- FAIL: TestTraceCPUProfile/Default (4.23s) trace_test.go:177: execution trace did not include all CPU profile samples; 28 in profile, 14 in trace trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 2 in profile, 1 in trace trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 20 in profile, 9 in trace trace_test.go:195: CPU profile included 1 samples at stack "aeshashbody:711|runtime.(*traceMap).put:62|runtime.(*traceStackTable).put:95|runtime.traceStack:79|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:4051|runtime.gosched_m:4070|runtime.mcall:192" not present in execution trace trace_test.go:195: CPU profile included 1 samples at stack "runtime.traceStack:70|runtime.traceLocker.stack:166|runtime.traceLocker.GoStop:425|runtime.traceLocker.GoSched:415|runtime.goschedImpl:4051|runtime.gosched_m:4070|runtime.mcall:192" not present in execution trace trace_test.go:202: execution trace CPU samples: trace_test.go:204: 9: "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271" trace_test.go:204: 1: "main.cpuHog0:126|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271" ... GoStart dt=50 g=1 g_seq=13593 GoStop dt=89 reason_string=20 stack=12 GoStart dt=50 g=1 g_seq=13594 GoStop dt=89 reason_string=20 stack=12 GoStart dt=49 g=1 g_seq=13595 GoStop dt=92 reason_string=20 stack=12 GoStart dt=50 g=1 g_seq=13596 GoStop ... log truncated ...

2023-11-16 19:26 openbsd-386-72 go@d573a8a8 runtime/trace.TestTraceCPUProfile (log)

fatal error: runtime: out of memory runtime stack: runtime.throw({0x81ffb4e, 0x16}) /tmp/workdir/go/src/runtime/panic.go:1023 +0x4d fp=0x12e44704 sp=0x12e446f0 pc=0x8082ddd runtime.sysMapOS(0xad000000, 0x800000) /tmp/workdir/go/src/runtime/mem_bsd.go:81 +0x142 fp=0x12e44730 sp=0x12e44704 pc=0x805ef32 runtime.sysMap(0xad000000, 0x800000, 0x834b118) /tmp/workdir/go/src/runtime/mem.go:155 +0x3f fp=0x12e44740 sp=0x12e44730 pc=0x805ed0f runtime.(*mheap).grow(0x833dde0, 0x375) ... /tmp/workdir/go/src/internal/trace/parser.go:269 +0x152b fp=0x6dcd8ba0 sp=0x6dcd8a6c pc=0x816775b internal/trace.parse({0x8239b70, 0x6dde00f0}, {0x0, 0x0}) /tmp/workdir/go/src/internal/trace/parser.go:104 +0x27 fp=0x6dcd8bec sp=0x6dcd8ba0 pc=0x8165f87 internal/trace.Parse({0x8239b70, 0x6dde00f0}, {0x0, 0x0}) /tmp/workdir/go/src/internal/trace/parser.go:91 +0x37 fp=0x6dcd8c1c sp=0x6dcd8bec pc=0x8165e67 runtime/trace_test.parseTrace(0x6dca8300, {0x8239b70, 0x6dde00f0}) /tmp/workdir/go/src/runtime/trace/trace_test.go:154 +0x47 fp=0x6dcd8d54 sp=0x6dcd8c1c pc=0x81b11c7 runtime/trace_test.TestTraceCPUProfile(0x6dca8300) /tmp/workdir/go/src/runtime/trace/trace_test.go:699 +0x461 fp=0x6dcd8f98 sp=0x6dcd8d54 pc=0x81b4361 testing.tRunner(0x6dca8300, 0x820a06c)

— watchflakes

rhysh commented 11 months ago

This latest set of failures seem related to the v2 tracer; those on arm64 are in the internal/trace/v2 package (right after the ok internal/trace line).

The linux-386-longtest failure looks like a logical race in the v2 tracer's wakeableSleep mechanism, near https://github.com/golang/go/blob/e14b96cb511ab3437dedf2be2fa49d4516189bbc/src/runtime/trace2.go#L908

CC @mknyszek for those.

The openbsd-386-72 failure is for the v1 runtime/trace.TestTraceCPUProfile test, running out of memory while parsing the trace. https://github.com/golang/go/blob/d573a8a8e86157d1d9ea43e6bcb03ce5b9baa9fb/src/internal/trace/parser.go#L269 is events = append(events, ev), where ev is a type rawEvent struct {int; byte; []uint64; []string}, which on GOARCH=386 would be 8 words of 4 bytes each, 32 bytes in total. The failed allocation is runtime.growslice(0xac500000, 0x2c301, 0x2c300, 0x1, 0x81e4340), growing that slice beyond 180992 entries. That's 5.8 MB (growing to 0x375 pages of 8 kiB each, 7.3 MB) which doesn't seem large by itself, but maybe 180k+ entries in the test trace is unusually large? I don't have a baseline for that. There are nice log lines in mheap.grow when the call to mheap.sysAlloc fails, but the crash here is in sysMap (which throws directly in sysMapOS) so the log doesn't include context on the total heap size.

bcmills commented 11 months ago

the crash here is in sysMap (which throws directly in sysMapOS) so the log doesn't include context on the total heap size.

Compare:

52546
53576

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-20 15:10 windows-amd64-longtest go@0709f1bb internal/trace/v2.TestTraceCPUProfile (log)

— watchflakes

mknyszek commented 11 months ago

Sorry for not replying sooner. The linux-386-longtest failure was fixed by https://go.dev/cl/542335.

The new failure is fun -- that may be an actual failure at runtime from the new tracer. I'll try to reproduce.

As a side-note, the runtime/trace tests barely use the new tracer at all (TestTraceCPUProfile doesn't). If there's a failure there, it's not related to that specifically.

mknyszek commented 11 months ago

Oh! No, that new failure is not real actually. I put a low-quality optimistic "deadlock detector" but it turns out there was no deadlock in this case at all. The test just didn't expect the additional output.

I'll put the "deadlock detector" behind a constant. It's for debugging anyway.

gopherbot commented 11 months ago

Change https://go.dev/cl/544235 mentions this issue: runtime: disable trace v2 optimistic deadlock debugger

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-21 22:23 freebsd-arm64-dmgk go@ff05cdbd TestTraceCPUProfile (log)

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-21 17:24 freebsd-arm64-dmgk go@0cb45bac TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.36s) --- FAIL: TestTraceCPUProfile/Default (4.31s) trace_test.go:177: execution trace did not include all CPU profile samples; 29 in profile, 14 in trace trace_test.go:187: execution trace did not include all CPU profile samples for stack "main.cpuHog0:130|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271"; 20 in profile, 9 in trace trace_test.go:195: CPU profile included 1 samples at stack "runtime.acquirem:577|runtime.traceAcquireEnabled:193|runtime.traceAcquire:176|runtime.execute:3124|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192" not present in execution trace trace_test.go:195: CPU profile included 1 samples at stack "runtime.releasem:584|runtime.traceRelease:238|runtime.execute:3132|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192" not present in execution trace trace_test.go:195: CPU profile included 2 samples at stack "main.cpuHog0:125|main.cpuHog1:120|main.cpuHogger:107|main.main.func1.1:44|runtime/pprof.Do:51|main.main.func1:43|main.main:50|runtime.main:271" not present in execution trace trace_test.go:202: execution trace CPU samples: trace_test.go:204: 1: "runtime/internal/atomic.(*Uint32).Load:194|runtime.(*traceSchedResourceState).statusWasTraced:191|runtime.traceLocker.eventWriter:113|runtime.traceLocker.GoStart:398|runtime.execute:3131|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192" trace_test.go:204: 1: "runtime.vdsoClockGettime:81|runtime.nanotime1:101|runtime.nanotime:19|runtime.traceClockNow:56|runtime.traceWriter.event:146|runtime.traceEventWriter.write:131|runtime.traceLocker.GoStart:399|runtime.execute:3131|runtime.schedule:3921|runtime.goschedImpl:4068|runtime.gosched_m:4073|runtime.mcall:192" ... GoStop dt=81 reason_string=20 stack=12 GoStart dt=53 g=1 g_seq=12759 GoStop dt=78 reason_string=20 stack=12 GoStart dt=51 g=1 g_seq=12760 GoStop dt=83 reason_string=20 stack=12 GoStart dt=53 g=1 g_seq=12761 GoStop dt=78 reason_string=20 stack=12 GoStart dt=49 g=1 g_seq=12 ... log truncated ...

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-11-23 17:33 freebsd-arm64-dmgk go@0c7e5d3b internal/trace/v2.TestTraceCPUProfile (log)

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-01 19:58 netbsd-386-9_3 go@40f6fbf1 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (2.77s) --- FAIL: TestTraceCPUProfile/Default (2.77s) trace_test.go:551: stderr: SIGSEGV: segmentation violation PC=0x80a2971 m=3 sigcode=1 addr=0x0 goroutine 0 gp=0x8406b48 m=3 mp=0x8440808 [idle]: runtime.readgstatus(...) /tmp/workdir/go/src/runtime/proc.go:1056 runtime.traceWriter.writeProcStatusForP({{0x8440808, 0x3}, 0x0}, 0x8432008, 0x0) /tmp/workdir/go/src/runtime/trace2status.go:91 +0x61 fp=0x8455ea0 sp=0x8455e78 pc=0x80a2971 ... esi 0xa3b0 ebp 0x82212e0 esp 0x8455e78 eip 0x80a2971 eflags 0x10246 cs 0x17 fs 0xab gs 0xb3 exit status 2 trace_test.go:553: exit status 1

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-01 21:24 linux-ppc64-sid-buildlet go@ecb9d9b9 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (4.04s) --- FAIL: TestTraceCPUProfile/Default (4.04s) trace_test.go:551: stderr: SIGSEGV: segmentation violation PC=0x729a0 m=3 sigcode=1 addr=0x0 goroutine 0 gp=0xc000003040 m=3 mp=0xc000040e08 [idle]: runtime.readgstatus(...) /workdir/go/src/runtime/proc.go:1056 runtime.traceWriter.writeProcStatusForP({{0xc000040e08?, 0x0?}, 0x0?}, 0x0?, 0x0?) /workdir/go/src/runtime/trace2status.go:91 +0x110 fp=0xc000057cb0 sp=0xc000057c68 pc=0x729a0 ... r22 0xc0000543a0 r23 0xc000057da8 r24 0x0 r25 0x2016a0 r26 0x22 r27 0x0 r28 0x0 r29 0x0 r30 0xc000003040 r31 0x0 pc 0x729a0 ctr 0x0 link 0x7a1b4 xer 0x0 ccr 0x22420080 trap 0x300 exit status 2 trace_test.go:553: exit status 1

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-01 19:30 netbsd-arm64-bsiegert go@70c7fb75 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (7.58s) --- FAIL: TestTraceCPUProfile/Default (7.57s) trace_test.go:551: stderr: SIGSEGV: segmentation violation PC=0x6b440 m=5 sigcode=1 addr=0x0 goroutine 0 gp=0x40000844e0 m=5 mp=0x4000080708 [idle]: runtime.readgstatus(...) /var/gobuilder/buildlet/go/src/runtime/proc.go:1056 runtime.traceWriter.writeProcStatusForP({{0x4000080708?, 0x0?}, 0x0?}, 0x0?, 0x0?) /var/gobuilder/buildlet/go/src/runtime/trace2status.go:91 +0xf0 fp=0x4000091ce0 sp=0x4000091ca0 pc=0x6b440 ... r26 0x136df8 r27 0x0 r28 0x40000844e0 r29 0x4000091c98 lr 0x71888 sp 0x4000091ca0 pc 0x6b440 fault 0x0 exit status 2 trace_test.go:553: exit status 1

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-08 00:50 freebsd-arm64-dmgk go@4bf1ca4b internal/trace/v2.TestTraceCPUProfile (log)

— watchflakes

prattmic commented 11 months ago

Note that https://github.com/golang/go/issues/55317#issuecomment-1847818692 seems to have dropped CPU samples.

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-13 00:46 netbsd-arm-bsiegert go@7cac742f internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (17.92s) --- FAIL: TestTraceCPUProfile/Default (17.92s) reader_test.go:65: unexpected error while reading the trace: inconsistent thread for syscalling goroutine 6: thread has goroutine -1 trace_test.go:570: found bad trace; dumping to test log... trace_test.go:579: text trace too large to dump (10855978 bytes)

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-18 19:20 netbsd-386-9_3 go@a7097243 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (3.85s) --- FAIL: TestTraceCPUProfile/Default (3.85s) reader_test.go:65: unexpected error while reading the trace: inconsistent thread for syscalling goroutine 6: thread has goroutine 1 trace_test.go:570: found bad trace; dumping to test log... trace_test.go:579: text trace too large to dump (9072023 bytes)

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-18 20:06 linux-s390x-ibm go@450f5d90 internal/trace/v2.TestTraceCPUProfile (log)

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2023-12-19 04:39 freebsd-arm64-dmgk go@6fe0d375 internal/trace/v2.TestTraceCPUProfile (log)

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2024-01-08 20:15 freebsd-arm64-dmgk go@b702e043 internal/trace/v2.TestTraceCPUProfile (log)

— watchflakes

bcmills commented 10 months ago

@prattmic, @mknyszek: this still has a fairly high failure rate. Is it still being investigated? (Would it make sense to add a testenv.SkipFlaky for this failure mode for the 1.22 release?)

mknyszek commented 10 months ago

Apologies, I got deeply side-tracked.

I think SkipFlaky on some platforms is correct here. It occurs to me that this test is inherently flaky. There's nothing guaranteeing that the goroutine that transfers CPU samples from the profile buffer to the trace will run before tracing ends. It can definitely happen that the goroutine just gets delayed for some reason and samples are dropped. More specifically, the goroutine that does the transfer awakens for the first time to see tracing is off, so it ignores the rest of the profile buffer's contents and drops them on the floor. It can't really do anything else because tracing is off; there's nowhere to put the information! This is fine -- samples are always allowed to drop.

I'll think about how to make this test less flaky, but given that it's not failing with any regularly on first-class platforms, I think doing platform-specific skips is fine. There's nothing really platform-specific about this code.

bcmills commented 10 months ago

If there is nothing platform-specific about the code, why would a platform-specific skip be even remotely appropriate‽

One of the interesting properties of the reverse builders for secondary ports is that they run on more diverse hardware — with more diverse scheduling — than our GCE-based builders. It seems gravely mistaken to assume that external users on first-class ports are using hardware and kernels that match our GCE builders, so why should we assume that they will not also run into this flakiness when running go test all or similar on their own CI systems?

mknyszek commented 10 months ago

Yeah, you're right. I frequently forget that other users might run go test all; I was mainly just thinking about test coverage loss on our side.

I think I can just make the test more resilient to lost samples. It seems like in all these failures at least some of the samples show up in the trace. I figure we can just make the test pass if at least one of these samples shows up in the trace.

This only verifies that functionality works at a baseline, not how well it works, but given that the functionality is best-effort anyway, I think that's about as good as we can make it for now.

mknyszek commented 10 months ago

Taking a step back, I think this actually represents a regression in behavior vs. the old tracer, but it was really hard to tell because of how best-effort this functionality is.

Specifically, the old tracer was subtly ensuring that there was a CPU profile buffer flush sometime close to when a trace would end, because the trace reader would get woken up a whole bunch. In the new tracer the CPU profile buffer is translated into the execution trace via a separate goroutine. This has in general been positive for understanding the best-effort-ness of this behavior, but what we lost was this property that a flush was likely to happen close in time to when the trace ended.

My proposal to restore this behavior is just to wake the trace CPU profile reader goroutine before we're about to change generations. It does not ensure that a flush happens, but makes it far more likely. It's still best-effort, but a bit more predictable and will provide more understandable results.

Overall, I'm not really that happy with the fact that this is so best-effort still. Early on in the trace design I tried to write this trace CPU profile reader goroutine to actually block on the CPU profile buffer getting filled, making it much more precise. But because of the problems on Darwin with CPU profiles we encountered earlier this year, I had to work around that.

In the future it would be good to consider blocking in traceAdvance until we can ensure a recent flush of the CPU profile buffer happened. I'm reluctant to introduce that now just because that's how you introduce deadlocks in a new release, but I'll file an issue to track that.

I'll send a CL with the better-effort wakeup which should hopefully resolve most, if not all of the failures seen in this issue.

gopherbot commented 10 months ago

Change https://go.dev/cl/555495 mentions this issue: runtime: make a much better effort to emit CPU profile in a trace

mknyszek commented 10 months ago

@prattmic convinced me to try to do the right thing here, and eliminate all parts of the tracer making this CPU profile -> trace process best-effort. https://go.dev/cl/555495 implements this. I was originally skeptical given that we're pretty late in the release cycle, but I figure at the very least we can have this on hand and decide whether it's worth submitting now. If it's not, I can land it for next cycle and I'll send the wakeup patch (which is like, 5 lines of code) which will certainly not make the situation any worse.

gopherbot commented 9 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2024-01-12 16:19 freebsd-arm64-dmgk go@e58e8139 internal/trace/v2.TestTraceCPUProfile (log)

— watchflakes

gopherbot commented 9 months ago

Change https://go.dev/cl/557838 mentions this issue: [release-branch.go1.22] runtime: make a much better effort to emit CPU profile in a trace

gopherbot commented 9 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2024-02-02 16:26 openbsd-386-72 go@2fdad8af runtime/trace.TestTraceCPUProfile (log)

fatal error: runtime: out of memory runtime stack: runtime.throw({0x81fbb69, 0x16}) /tmp/workdir/go/src/runtime/panic.go:1077 +0x4d fp=0x493334cc sp=0x493334b8 pc=0x80826ed runtime.sysMapOS(0x97c00000, 0x400000) /tmp/workdir/go/src/runtime/mem_bsd.go:81 +0x144 fp=0x493334f8 sp=0x493334cc pc=0x805e894 runtime.sysMap(0x97c00000, 0x400000, 0x8340958) /tmp/workdir/go/src/runtime/mem.go:155 +0x3f fp=0x49333508 sp=0x493334f8 pc=0x805e66f runtime.(*mheap).grow(0x8333640, 0x42) ... /tmp/workdir/go/src/internal/trace/parser.go:269 +0x156f fp=0x5a841b90 sp=0x5a841a5c pc=0x816434f internal/trace.parse({0x82348e0, 0x5a886588}, {0x0, 0x0}) /tmp/workdir/go/src/internal/trace/parser.go:104 +0x27 fp=0x5a841bdc sp=0x5a841b90 pc=0x8162b27 internal/trace.Parse({0x82348e0, 0x5a886588}, {0x0, 0x0}) /tmp/workdir/go/src/internal/trace/parser.go:91 +0x37 fp=0x5a841c0c sp=0x5a841bdc pc=0x8162a07 runtime/trace_test.parseTrace(0x5a8b03c0, {0x82348e0, 0x5a886588}) /tmp/workdir/go/src/runtime/trace/trace_test.go:146 +0x47 fp=0x5a841d48 sp=0x5a841c0c pc=0x81ade97 runtime/trace_test.TestTraceCPUProfile(0x5a8b03c0) /tmp/workdir/go/src/runtime/trace/trace_test.go:676 +0x473 fp=0x5a841f98 sp=0x5a841d48 pc=0x81b10a3 testing.tRunner(0x5a8b03c0, 0x82060b8)

— watchflakes

gopherbot commented 6 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- (pkg == "runtime/trace" && test == "TestTraceCPUProfile") || log ~ `--- FAIL: TestTraceCPUProfile`

2024-04-22 15:29 netbsd-arm64-bsiegert go@734fd7a9 internal/trace/v2.TestTraceCPUProfile (log)

--- FAIL: TestTraceCPUProfile (12.18s) --- FAIL: TestTraceCPUProfile/Default (12.18s) trace_test.go:573: stderr: SIGSEGV: segmentation violation PC=0x7099c m=2 sigcode=1 addr=0xa goroutine 0 gp=0x4000004a80 m=2 mp=0x4000040808 [idle]: runtime.fpTracebackPCs(...) /var/gobuilder/buildlet/go/src/runtime/tracestack.go:256 runtime.traceStack(0x1?, 0x4fe34?, 0x3) /var/gobuilder/buildlet/go/src/runtime/tracestack.go:113 +0x28c fp=0x4000059eb0 sp=0x4000059a40 pc=0x7099c ... r26 0x115228 r27 0x202000 r28 0x4000004a80 r29 0x4000059a38 lr 0x707a0 sp 0x4000059a40 pc 0x7099c fault 0xa exit status 2 trace_test.go:575: exit status 1

— watchflakes

golang / go

runtime/trace: TestTraceCPUProfile failures #55317

52546

53576