runtime: TestRuntimeLockMetricsAndProfile failures

gopherbot commented 11 months ago

#!watchflakes
post <- pkg == "runtime" && test ~ `TestRuntimeLockMetricsAndProfile`

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s)
    --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s)
        --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s)
            metrics_test.go:1048: lock contention growth in runtime/pprof's view  (0.006333s)
            metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006331s)
            metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6332772
            metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-17 23:04 linux-ppc64-sid-power10 go@0b31a46f runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s) metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.006333s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006331s) metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6332772 metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

2023-11-17 23:04 windows-386-2016 go@0b31a46f runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.21s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.01s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.000335s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=491 value=173075

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-17 23:16 windows-amd64-longtest go@f664031b runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.22s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.002332s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3062 value=988506

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-17 23:16 linux-ppc64-sid-power10 go@f664031b runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s) metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.006328s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006326s) metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6328117 metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

2023-11-17 23:34 linux-ppc64-sid-power10 go@631a6c2a runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.02s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.01s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s) metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.005366s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.005365s) metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=86 value=5366454 metrics_test.go:1156: mutex profile reported contention count different from the known true count (86 != 100)

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-17 23:15 windows-arm64-11 go@f67b2d8f runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.16s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.01s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.003580s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3101 value=2156101

2023-11-17 23:16 windows-arm64-11 go@f664031b runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.13s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.003674s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3308 value=2331713

2023-11-17 23:34 windows-arm64-11 go@631a6c2a runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.21s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.07s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.005268s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=5951 value=3491388

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-19 02:15 windows-amd64-2016 go@aa9dd500 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.22s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.001346s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=3436 value=1006621

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-19 07:31 linux-ppc64-sid-power10 go@d67ac938 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.02s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.02s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s) metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.006294s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006293s) metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6294492 metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

2023-11-19 07:31 windows-arm64-11 go@d67ac938 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.16s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.05s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.009316s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=8900 value=6224493

2023-11-19 15:24 linux-ppc64-sid-power10 go@1c15291f runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.02s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.01s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s) metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.006170s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.006168s) metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6169552 metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

2023-11-19 15:24 windows-arm64-11 go@1c15291f runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.16s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.05s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.006365s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=4658 value=3628090

2023-11-19 17:06 linux-ppc64-sid-power10 go@2551fffd runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.01s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.01s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.01s) metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.000038s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000038s) metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=2 value=37499 metrics_test.go:1153: runtime/metrics reported less than the known minimum contention duration (0.000038s < 0.001000s) metrics_test.go:1156: mutex profile reported contention count different from the known true count (2 != 100)

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-19 21:11 windows-arm64-11 go@06145fe0 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.15s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.01s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.002936s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=2560 value=1676526

2023-11-19 22:05 windows-arm64-11 go@63828938 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.23s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.08s) metrics_test.go:1041: no increase in /sync/mutex/wait/total:seconds metric metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.007995s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1077: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=5920 value=5630416

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-21 21:30 linux-ppc64le-buildlet go@41f58b22 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.10s) metrics_test.go:967: NumCPU 4 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.08s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.05s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.020478s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.081851s) metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=106 value=20478295 metrics_test.go:1199: views differ by more than 3.750000x

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-21 21:27 linux-386-longtest go@48a6362d runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.17s) metrics_test.go:967: NumCPU 16 metrics_test.go:968: GOMAXPROCS 16 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.05s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000048s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000049s) metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1087: profile: PeriodType: contentions count Period: 1 ... 7: 0x8048000/0x841a000/0x0 /workdir/tmp/go-build1114361604/b001/runtime.test [FN] 8: 0xf7d10000/0xf7e64000/0x19000 /lib32/libc-2.31.so 1a7ffd46bcc790ed7017919b4ca6fc812c8c615c 9: 0xf7ee4000/0xf7ef5000/0x5000 /lib32/libpthread-2.31.so 505f3eecd0c7c22229602b598ee187d5b5a94bbb 10: 0xf7f04000/0xf7f11000/0x3000 /lib32/libresolv-2.31.so d8095c858bdd3a4d2a534300f09990a6a0644ea4 11: 0xf7f28000/0xf7f2a000/0x0 [vdso] 12: 0xf7f2b000/0xf7f48000/0x1000 /lib32/ld-2.31.so dd55267740b58ae626dc11439ff85e923974f7d5 metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

— watchflakes

gopherbot commented 11 months ago

Change https://go.dev/cl/544375 mentions this issue: runtime: test for contention in both semaphore paths

rhysh commented 11 months ago

Most of the failures are from before https://go.dev/cl/544195. Following that, I've seen four failures (via fetchlogs/greplogs).

https://build.golang.org/log/c733e91b4693774eb42501bbe93a8ee071ef312e addressed by https://go.dev/cl/544375 https://build.golang.org/log/79b92d5304bd79fbdfac90c953d53b172f2d8250 addressed by https://go.dev/cl/544375 https://build.golang.org/log/2e5e4f5629a179365a02d2bb9548ed5b94597c54 during a slow runtime test which I expect adds a lot of non-contention work, making it harder to encounter the desired contention ... logging to confirm in https://go.dev/cl/544375 https://build.golang.org/log/6ee5577c3e0a08531b8d7f009631323b4aab7c23 on linux-ppc64le, showing pprof with the amount of contention I expect but a metrics view with much more weight. One of the earlier linux-390x failures shows a similar behavior (though also failed due to the big-endian bug I fixed in CL 544195).

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-21 21:29 linux-ppc64-sid-buildlet go@90ba4452 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.22s) metrics_test.go:967: NumCPU 4 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.19s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.15s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.264071s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.067915s) metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=134 value=264071372 metrics_test.go:1199: views differ by more than 3.750000x

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-21 21:29 aix-ppc64 go@71052169 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.75s) metrics_test.go:967: NumCPU 12 metrics_test.go:968: GOMAXPROCS 12 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.31s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.03s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.022040s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.056988s) metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=22039857 metrics_test.go:1169: views differ by more than 1.500000x

— watchflakes

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-22 18:12 linux-arm64-longtest go@67600298 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.04s) metrics_test.go:967: NumCPU 16 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.00s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000130s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000001s) metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1087: profile: PeriodType: contentions count Period: 1 ... metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.park_m runtime.mcall] metrics_test.go:1089: have stack [runtime._LostContendedLock] metrics_test.go:1089: have stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.gcMarkTermination.func4 runtime.forEachPInternal runtime.gcMarkTermination.forEachP.func6 runtime.systemstack runtime.forEachP runtime.gcMarkTermination runtime.gcMarkDone runtime.gcBgMarkWorker] metrics_test.go:1089: have stack [runtime.unlock runtime.GOMAXPROCS runtime_test.TestRuntimeLockMetricsAndProfile.func6 testing.tRunner] metrics_test.go:1089: have stack [runtime.unlock runtime.semrelease1 sync.runtime_Semrelease sync.(*WaitGroup).Add sync.(*WaitGroup).Done runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1.1 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.acquirep runtime.stopm runtime.gcstopm runtime.findRunnable runtime.schedule runtime.park_m runtime.mcall] metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime.GOMAXPROCS runtime_test.TestRuntimeLockMetricsAndProfile.func6 testing.tRunner] metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

— watchflakes

rhysh commented 11 months ago

That's two more ppc64 failures from a clock mismatch (not sure what's up with ppc64's clocks), and one more of the type that https://go.dev/cl/544375 addresses.

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-27 17:23 solaris-amd64-oraclerel go@e158cb21 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (1.15s) metrics_test.go:967: NumCPU 21 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.59s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.47s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.139584s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.797815s) metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=92 value=139583621 metrics_test.go:1199: views differ by more than 3.750000x

— watchflakes

prattmic commented 11 months ago

Note that there are some additional timeouts at https://github.com/golang/go/issues/55308#issuecomment-1819326331.

I'm not sure how to update the watchflakes rule to match these timeouts.

rhysh commented 11 months ago

From that issue comment, 2023-11-19 22:05 darwin-amd64-longtest go@63828938 runtime and 2023-11-20 02:02 darwin-amd64-longtest go@237715cf runtime are timeouts during the maymorestack=mayMoreStackMove re-run of the runtime tests. Those predate the roll-forward of https://go.dev/cl/544195 , which includes a related update to the semrelease test to address those exact failures. (I'd found them via fetchlogs/greplogs.)

Previously, the test involved three threads and required one of them to hold the semaphore's lock, one to contend for it, and the third to notice that contention and mark the test as complete. My interpretation of the mayMoreStackMove failures is that they added a lot of non-contention work, which made it hard for the test's three threads to each be in those exact right spots at a single instant.

In https://go.dev/cl/544195 , I relaxed that requirement: now the test can finish early if one of the threads notices the other two have created the necessary contention, otherwise it'll give up after 10,000,000 iterations. (My darwin/arm64 system typically needed 500,000–4,000,000 iterations during the mayMoreStackPreempt re-test.) In either case, the only verification it does of the result is that there's "more than zero" contention and that the call stack appears as expected.

I continue to struggle with the balance between wanting thorough tests for this part of the runtime (involving concurrency and clocks), and the need for 0% flakes. Maybe the performance dashboard sets an example there, including an opportunity for more output than ok runtime 690.399s? I'd appreciate any advice you can share—thanks.

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-29 19:13 freebsd-arm64-dmgk go@636c6e35 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.03s) metrics_test.go:967: NumCPU 2 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.00s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000013s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000013s) metrics_test.go:1095: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1095: stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1098: profile: PeriodType: contentions count ... 2: 0x0/0x0/0x0 [FN] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1100: have stack [runtime.unlock runtime.parkunlock_c runtime.park_m runtime.mcall] metrics_test.go:1100: have stack [runtime.unlock runtime.stopm runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall] metrics_test.go:1100: have stack [runtime.unlock runtime.semrelease1 sync.runtime_Semrelease sync.(*WaitGroup).Add sync.(*WaitGroup).Done runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1.1 runtime_test.(*contentionWorker).run] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1100: have stack [runtime.unlock runtime.stopm runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall] metrics_test.go:1103: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1271: finished test early (9999534 tries remaining)

— watchflakes

mknyszek commented 11 months ago

This failure appeared on a first-class port on https://go.dev/cl/546635. @bcmills was that CL based on a commit before https://go.dev/cl/544375 landed by any chance?

EDIT: Yes, it was. Here's the failure: https://ci.chromium.org/ui/p/golang/builders/try-workers/gotip-linux-arm64-test_only/b8762881407384401921/overview

rhysh commented 11 months ago

From https://ci.chromium.org/ui/p/golang/builders/try-workers/gotip-linux-arm64-test_only/b8762881407384401921/overview, I get https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8762881407384401921/+/u/step/11/log/2, which shows two "metrics_test.go:1103: want stack" lines, indicating that it includes https://go.dev/cl/544375.

CL 546635 at PS 4 is https://go.googlesource.com/go/+/30e6fc629529abf8da4528f4fdbb5a78363624fb, with parent https://go.googlesource.com/go/+/2e6387cbec924dbd01007421d7442125037c66b2 .

I don't see a line in the failing build log matching https://go.googlesource.com/go/+/2e6387cbec924dbd01007421d7442125037c66b2/src/runtime/metrics_test.go#1271 , which means the test ran for 10,000,000 iterations without the test code noticing contention on the semaphore lock. The mutex profile result shows that the runtime itself didn't encounter contention either. (In #55160, the contention was present but the test had been unable to notice it and so would run forever.)

The "TestRuntimeLockMetricsAndProfile/runtime.lock" test verifies that runtime-internal lock contention is able to be reported with the correct count, magnitude, and call stack. The role of the "TestRuntimeLockMetricsAndProfile/runtime.semrelease" test is to check that the call stack ends at a particular depth, mostly so we can notice when that changes (so we can update the skip parameter, for example). It's proven tricker to test, since the lock itself isn't under the test's control. Do you have advice on how to test semacquire/semrelease, or is the best option to t.Skip that for now?

gopherbot commented 11 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-01 21:50 linux-ppc64-sid-buildlet go@3220bbe1 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (1.08s) metrics_test.go:967: NumCPU 4 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (1.04s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.11s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.126391s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.024345s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=114 value=126390697 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-06 17:29 linux-386-buster go@c80bd631 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.05s) metrics_test.go:967: NumCPU 16 metrics_test.go:968: GOMAXPROCS 16 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.00s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000127s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000007s) metrics_test.go:1095: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1095: stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1098: profile: PeriodType: contentions count ... metrics_test.go:1100: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.gcMarkTermination.func4 runtime.forEachPInternal runtime.gcMarkTermination.forEachP.func6 runtime.systemstack runtime.forEachP runtime.gcMarkTermination runtime.gcMarkDone runtime.gcBgMarkWorker] metrics_test.go:1100: have stack [runtime.unlock runtime.(*sweepLocked).sweep.(*mheap).freeSpan.func3 runtime.(*mheap).freeSpan runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.acquirep runtime.stopm runtime.gcstopm runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1100: have stack [runtime.unlock runtime.wakep runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1100: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall] metrics_test.go:1100: have stack [runtime.unlock runtime.(*spanSet).push runtime.(*sweepLocked).sweep runtime.(*mcentral).uncacheSpan runtime.(*mcache).releaseAll runtime.(*mcache).prepareForSweep runtime.acquirep runtime.stopm runtime.gcstopm runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1100: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall] metrics_test.go:1103: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1271: finished test early (9998114 tries remaining)

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-08 20:35 android-386-emu go@9869a0ce runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.18s) metrics_test.go:967: NumCPU 2 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.17s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.09s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.077049s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.120200s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=77049011 metrics_test.go:1182: views differ by more than 1.500000x

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-08 03:28 aix-ppc64 go@78b42a53 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (1.10s) metrics_test.go:967: NumCPU 12 metrics_test.go:968: GOMAXPROCS 2 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (1.10s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.43s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.373353s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.778912s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=373353395 metrics_test.go:1182: views differ by more than 1.500000x

2023-12-08 18:34 aix-ppc64 go@6cdf2cca runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (1.25s) metrics_test.go:967: NumCPU 12 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (1.03s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.61s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (1.082800s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.228814s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=122 value=1082799576 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

bcmills commented 10 months ago

Updated the watchflakes pattern to use post instead of default — there were a lot of these posted on #58901.

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-11-19 02:15 linux-arm64-longtest go@aa9dd500 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (111.03s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.05s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.02s) metrics_test.go:1048: lock contention growth in runtime/pprof's view (0.007161s) metrics_test.go:1049: lock contention growth in runtime/metrics' view (0.007391s) metrics_test.go:1077: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=99 value=6615799 metrics_test.go:1156: mutex profile reported contention count different from the known true count (99 != 100)

2023-11-21 21:29 linux-s390x-ibm go@71052169 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.07s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 2 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.07s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.04s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.068352s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.016427s) metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=94 value=68351501 metrics_test.go:1199: views differ by more than 3.750000x

2023-11-21 21:29 linux-s390x-ibm go@71052169 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.15s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.15s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.09s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.028768s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.147208s) metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=88 value=28768251 metrics_test.go:1199: views differ by more than 3.750000x

2023-11-21 22:06 linux-amd64-longtest go@5a6f1b35 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.07s) metrics_test.go:967: NumCPU 16 metrics_test.go:968: GOMAXPROCS 16 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.02s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000402s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000147s) metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1087: profile: PeriodType: contentions count Period: 1 ... metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1089: have stack [runtime.unlock runtime.wakep runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1089: have stack [runtime.unlock runtime.wakep runtime.resetspinning runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1089: have stack [runtime.unlock runtime.wakep runtime.ready runtime.readyWithTime.goready.func1 runtime.systemstack runtime.goready runtime.readyWithTime runtime.semrelease1 sync.runtime_Semrelease sync.(*WaitGroup).Add sync.(*WaitGroup).Done runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1.2 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall] metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goschedImpl runtime.gopreempt_m runtime.newstack runtime.morestack] metrics_test.go:1089: have stack [runtime.unlock runtime.findRunnable runtime.schedule runtime.goexit0 runtime.mcall] metrics_test.go:1089: have stack [runtime._LostContendedLock] metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

2023-11-21 22:58 linux-s390x-ibm go@e6b76bfc runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (3.05s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (2.66s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (2.30s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.415726s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (3.690941s) metrics_test.go:1085: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=96 value=415725996 metrics_test.go:1199: views differ by more than 3.750000x

2023-11-22 01:59 linux-s390x-ibm go@29ca9cc4 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (3.39s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.70s) metrics_test.go:1044: no increase in mutex profile metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000000s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1087: profile: PeriodType: contentions count ... 8: 0x10000/0x4af000/0x0 /data/golang/workdir/tmp/go-build3737106794/b001/runtime.test [FN] 9: 0x3ff70080000/0x3ff7008d000/0x0 /usr/lib64/libnss_files-2.28.so d5439ade4d0ec671938e4da6524393f90d257a2c 10: 0x3ffb9000000/0x3ffb919a000/0x0 /usr/lib64/libc-2.28.so 488fd2a4642f972046dd77bad3705e6625911a3b 11: 0x3ffb9200000/0x3ffb921c000/0x0 /usr/lib64/libpthread-2.28.so f024c1621e82621ea41079a98b3b3a63a6fef9c7 12: 0x3ffb9280000/0x3ffb9296000/0x0 /usr/lib64/libresolv-2.28.so be64c7c2cdbf7a338ec667565ed4835cff61a844 13: 0x3ffb9380000/0x3ffb93a8000/0x0 /usr/lib64/ld-2.28.so 369cc3a745b7185a5225a79b03b95e960a8174bc 14: 0x3ffb93fe000/0x3ffb9400000/0x0 [vdso] metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

2023-11-22 15:50 linux-s390x-ibm go@37971b31 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (3.46s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (1.07s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000004s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1085: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1087: profile: PeriodType: contentions count Period: 1 ... 7: 0x10000/0x4af000/0x0 /data/golang/workdir/tmp/go-build629452403/b001/runtime.test [FN] 8: 0x3ff9f380000/0x3ff9f51a000/0x0 /usr/lib64/libc-2.28.so 488fd2a4642f972046dd77bad3705e6625911a3b 9: 0x3ff9f580000/0x3ff9f59c000/0x0 /usr/lib64/libpthread-2.28.so f024c1621e82621ea41079a98b3b3a63a6fef9c7 10: 0x3ff9f600000/0x3ff9f616000/0x0 /usr/lib64/libresolv-2.28.so be64c7c2cdbf7a338ec667565ed4835cff61a844 11: 0x3ff9f700000/0x3ff9f728000/0x0 /usr/lib64/ld-2.28.so 369cc3a745b7185a5225a79b03b95e960a8174bc 12: 0x3ff9f77e000/0x3ff9f780000/0x0 [vdso] metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1089: have stack [runtime.unlock runtime.startTheWorldWithSema runtime.gcStart.func3 runtime.systemstack runtime.gcStart runtime.mallocgc runtime.newobject compress/flate.NewWriter compress/gzip.(*Writer).Write runtime/pprof.(*profileBuilder).build runtime/pprof.printCountCycleProfile runtime/pprof.writeProfileInternal runtime/pprof.writeMutex runtime/pprof.(*Profile).WriteTo runtime_test.TestRuntimeLockMetricsAndProfile.func2 runtime_test.TestRuntimeLockMetricsAndProfile.func3 runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.TestRuntimeLockMetricsAndProfile.func6 testing.tRunner] metrics_test.go:1089: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1091: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

2023-11-30 13:03 linux-s390x-ibm-race go@b4fa5b16 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (7.96s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (7.69s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.70s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.187262s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.847541s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=90 value=187262008 metrics_test.go:1212: views differ by more than 3.750000x

2023-11-30 16:41 linux-s390x-ibm go@76d90a34 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (1.16s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 2 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (1.16s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.90s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.354971s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (1.382024s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=114 value=354970980 metrics_test.go:1212: views differ by more than 3.750000x

2023-12-02 03:15 linux-s390x-ibm go@de5b418b runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (2.73s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (1.21s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000005s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000000s) metrics_test.go:1095: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1095: stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1098: profile: PeriodType: contentions count ... 8: 0x3ff9c480000/0x3ff9c61a000/0x0 /usr/lib64/libc-2.28.so 488fd2a4642f972046dd77bad3705e6625911a3b 9: 0x3ff9c680000/0x3ff9c69c000/0x0 /usr/lib64/libpthread-2.28.so f024c1621e82621ea41079a98b3b3a63a6fef9c7 10: 0x3ff9c700000/0x3ff9c716000/0x0 /usr/lib64/libresolv-2.28.so be64c7c2cdbf7a338ec667565ed4835cff61a844 11: 0x3ff9c800000/0x3ff9c828000/0x0 /usr/lib64/ld-2.28.so 369cc3a745b7185a5225a79b03b95e960a8174bc 12: 0x3ff9c87e000/0x3ff9c880000/0x0 [vdso] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1100: have stack [runtime.unlock runtime.schedEnableUser runtime.gcMarkDone runtime.gcBgMarkWorker] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run]

2023-12-06 21:50 linux-s390x-ibm go@ff9269ee runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (7.05s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (6.74s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (3.73s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (6.437586s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.525414s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=6437585513 metrics_test.go:1212: views differ by more than 3.750000x

2023-12-07 19:13 linux-s390x-ibm-race go@77e76c43 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.06s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.05s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.02s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.015928s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.167804s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=14190992 metrics_test.go:1182: views differ by more than 1.500000x

2023-12-08 23:05 linux-s390x-ibm go@fe1b2f95 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (5.11s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (4.95s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (3.79s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (6.040768s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (1.421577s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=118 value=6040767774 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

bcmills commented 10 months ago

@rhysh, note that the failure in https://github.com/golang/go/issues/64253#issuecomment-1847818829 is on linux/386, which is a first-class port.

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-11 17:49 linux-ppc64le-buildlet go@20a03fc7 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.11s) metrics_test.go:967: NumCPU 4 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.10s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.07s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.014438s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.066682s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=84 value=14437836 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

bcmills commented 10 months ago

Another linux/386, this time in a LUCI TryBot: https://ci.chromium.org/ui/p/golang/builders/try/gotip-linux-386/b8761968338348788961/test-results?q=ExactID%3Aruntime.TestRuntimeLockMetricsAndProfile+VHash%3A970bd7ef542d232b&clean=&sortby=&groupby=

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-11 23:01 linux-s390x-ibm-race go@f4ff0e25 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (2.66s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (2.49s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (1.03s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (1.185168s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.292494s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=108 value=1185167803 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-12 17:12 aix-ppc64 go@ce43fb80 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.11s) metrics_test.go:967: NumCPU 12 metrics_test.go:968: GOMAXPROCS 12 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.10s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.08s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.139740s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.011868s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=96 value=139739559 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

bcmills commented 10 months ago

https://ci.chromium.org/ui/p/golang/builders/try-workers/gotip-linux-arm64-test_only/b8761883502571661809/test-results?sortby=&groupby= on a linux/arm64 LUCI TryBot (also a first-class port)

bcmills commented 10 months ago

And linux/amd64: https://ci.chromium.org/ui/p/golang/builders/try/gotip-linux-amd64-newinliner/b8761882873867708865/test-results?q=ExactID%3Aruntime.TestRuntimeLockMetricsAndProfile+VHash%3Add8d6a474fffb45d&clean=&sortby=&groupby=

mknyszek commented 10 months ago

Alrighty, I think we need to do something about this test.

@rhysh, I think from your last comment your conclusion is that this is a test-only issue, right? I think maybe for the moment let's skip the flakiest of the tests, which seems like the semrelease one.

Do you have advice on how to test semacquire/semrelease, or is the best option to t.Skip that for now?

Looking at the test, it seems like it just has 3 workers acquiring/releasing the same semaphore. Could you force them to block on the semaphore and operate in lockstep? Is the problem that it never takes the slow path, or that nothing ever blocks on the lock in the slow path? If it's the latter, then I guess operating in lockstep doesn't really help.

Would it be possible to manufacture something that also expects similar call stack pruning but is basically just a runtime.mutex under the hood so we can be more certain about what's going to happen?

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-12 19:57 linux-ppc64-sid-buildlet go@4daf924f runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.19s) metrics_test.go:967: NumCPU 4 metrics_test.go:968: GOMAXPROCS 2 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.19s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.15s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.259057s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.045391s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=128 value=259057403 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

mknyszek commented 10 months ago

Hm, it seems like the runtime.lock test is also failing a bunch. More often on the secondary port machines but still. Is that something of concern?

rhysh commented 10 months ago

For semrelease:

or that nothing ever blocks on the lock in the slow path?

Right, the critical section for manipulating semaphore sleeps is pretty small and not under the control of the (runtime-internal) user of the semaphore. There's also the problem that the runtime has 251 locks, with the hash of the semaphore address determining which to use. So observing contention on one of those shared locks (as the current test does) isn't proof that the test's own semaphore use is the one causing/experiencing the contention.

The failure in 2023-11-29T19:13:04-636c6e3/freebsd-arm64-dmgk saw contention (its output includes finished test early (9999534 tries remaining)), but the only new contention is on runtime.unlock runtime.parkunlock_c runtime.park_m runtime.mcall (benign, 12384 ns) and runtime.unlock runtime.semrelease1 sync.runtime_Semrelease sync.(*WaitGroup).Add sync.(*WaitGroup).Done runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1.1 runtime_test.(*contentionWorker).run (semrelease but of a different semaphore, started.Done(), 408 ns).

Maybe the semrelease test could look for a particular kind of failure rather than requiring success: If the profile includes the target function (runtime_test.TestRuntimeLockMetricsAndProfile.func6.1), then make sure it has one of the two allowable call stacks. But don't require that the target function appears in the profile.

Or, outright skip the test for Go 1.22. It's unlikely we'll land a change that would accidentally break it further, and the level of detail this is trying to test isn't visible without an opt-in GODEBUG flag anyway.

mknyszek commented 10 months ago

In triage, we decided to just skip these tests for now. That'll let us put it to okay-after-rc1, but with a goal of further investigation.

gopherbot commented 10 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test == "TestRuntimeLockMetricsAndProfile"

2023-12-13 20:40 linux-ppc64-sid-buildlet go@d95e25e8 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (0.12s) metrics_test.go:967: NumCPU 4 metrics_test.go:968: GOMAXPROCS 4 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (0.10s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.04s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.020335s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.051913s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=100 value=20334883 metrics_test.go:1182: views differ by more than 1.500000x

— watchflakes

mknyszek commented 10 months ago

More of an explanation: these tests are checking behavior that's behind a GODEBUG anyway. It's fine if we just disable the tests, because even if something breaks between now and the next dev period (unlikely) it won't affect the vast majority of users.

We won't delete the tests because that functionality is going to be revisited for Go 1.23 anyway (by @rhysh I believe).

gopherbot commented 10 months ago

Change https://go.dev/cl/550775 mentions this issue: runtime: skip TestRuntimeLockMetricsAndProfile for flakiness

mknyszek commented 10 months ago

I think this can still be worked on in Go 1.22 as a test-only fix, but there's really no reason this has to remain a release blocker, even okay-after-rc1. Moving to Go 1.23.

gopherbot commented 9 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test ~ `TestRuntimeLockMetricsAndProfile`

2023-12-06 21:50 gotip-linux-arm64 go@e914671f runtime.TestRuntimeLockMetricsAndProfile/runtime.semrelease (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.semrelease metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000007s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000003s) metrics_test.go:1095: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1095: stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1098: profile: PeriodType: contentions count Period: 1 Time: 2023-12-06 21:58:36.150796777 +0000 UTC Samples: ... 10: 0xfd51d3c9e000/0xfd51d3cb1000/0x0 /lib/aarch64-linux-gnu/libresolv-2.31.so f89b5cd98638f300c32b00c82000166bf4f6dcaf 11: 0xfd51d3cc5000/0xfd51d3ce6000/0x0 /lib/aarch64-linux-gnu/ld-2.31.so ae4da2e572cddd8f08179a9bcda3dc37a146e2eb 12: 0xfd51d3cf5000/0xfd51d3cf6000/0x0 [vdso] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1100: have stack [runtime.unlock runtime.startTheWorldWithSema runtime.gcStart.func3 runtime.systemstack runtime.gcStart runtime.mallocgc runtime.newobject runtime/pprof.allFrames runtime/pprof.(*profileBuilder).appendLocsForStack runtime/pprof.printCountCycleProfile runtime/pprof.writeProfileInternal runtime/pprof.writeMutex runtime/pprof.(*Profile).WriteTo runtime_test.TestRuntimeLockMetricsAndProfile.func2 runtime_test.TestRuntimeLockMetricsAndProfile.func3 runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.TestRuntimeLockMetricsAndProfile.func6 testing.tRunner] metrics_test.go:1100: have stack [runtime.unlock runtime.schedEnableUser runtime.gcMarkDone runtime.gcBgMarkWorker] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.47s)

2023-12-06 22:04 gotip-darwin-amd64_11 go@0ff2b33e runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.364851s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.071676s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=94 value=364850826 metrics_test.go:1212: views differ by more than 3.750000x --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.22s)

2023-12-08 00:50 gotip-darwin-amd64_11 go@4bf1ca4b runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.287760s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.048734s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=102 value=287759881 metrics_test.go:1212: views differ by more than 3.750000x --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.18s)

2023-12-08 20:07 gotip-darwin-amd64_13 go@e1cba47e runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.845498s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.137215s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=104 value=845497899 metrics_test.go:1212: views differ by more than 3.750000x --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.60s)

2023-12-11 22:29 gotip-linux-arm64 go@a839ec1e runtime.TestRuntimeLockMetricsAndProfile/runtime.semrelease (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.semrelease metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.000004s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.000004s) metrics_test.go:1095: stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1095: stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] has samples totaling n=0 value=0 metrics_test.go:1098: profile: PeriodType: contentions count Period: 1 Time: 2023-12-11 22:35:31.682247024 +0000 UTC Samples: ... 9: 0xfb76615b1000/0xfb76615cd000/0x0 /lib/aarch64-linux-gnu/libpthread-2.31.so 50698f111196e410c367ad6031b3cb34c4dfc1e4 10: 0xfb76615e2000/0xfb76615f5000/0x0 /lib/aarch64-linux-gnu/libresolv-2.31.so f89b5cd98638f300c32b00c82000166bf4f6dcaf 11: 0xfb7661609000/0xfb766162a000/0x0 /lib/aarch64-linux-gnu/ld-2.31.so ae4da2e572cddd8f08179a9bcda3dc37a146e2eb 12: 0xfb7661639000/0xfb766163a000/0x0 [vdso] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1100: have stack [runtime.unlock runtime.schedEnableUser runtime.gcMarkDone runtime.gcBgMarkWorker] metrics_test.go:1100: have stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semrelease1 runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] metrics_test.go:1103: want stack [runtime.unlock runtime.semacquire1 runtime.semacquire runtime_test.TestRuntimeLockMetricsAndProfile.func6.1 runtime_test.(*contentionWorker).run] --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.semrelease (0.89s)

2023-12-12 22:10 gotip-darwin-amd64_14 go@dce7f335 runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 metrics_test.go:1056: lock contention growth in runtime/pprof's view (0.065699s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (0.251284s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=78 value=65698826 metrics_test.go:1212: views differ by more than 3.750000x --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (0.16s)

— watchflakes

gopherbot commented 8 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test ~ `TestRuntimeLockMetricsAndProfile`

2023-12-11 23:29 linux-s390x-ibm go@b18b0588 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (10.45s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (10.44s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (9.29s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (2.227273s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (10.468337s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=102 value=2227273002 metrics_test.go:1212: views differ by more than 3.750000x

2023-12-11 23:29 linux-s390x-ibm go@b18b0588 runtime.TestRuntimeLockMetricsAndProfile (log)

--- FAIL: TestRuntimeLockMetricsAndProfile (10.45s) metrics_test.go:967: NumCPU 8 metrics_test.go:968: GOMAXPROCS 8 --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock (10.44s) --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-2 (9.29s) metrics_test.go:1056: lock contention growth in runtime/pprof's view (2.227273s) metrics_test.go:1057: lock contention growth in runtime/metrics' view (10.468337s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func5.1 runtime_test.(*contentionWorker).run] has samples totaling n=102 value=2227273002 metrics_test.go:1212: views differ by more than 3.750000x

— watchflakes

gopherbot commented 6 months ago

Change https://go.dev/cl/581296 mentions this issue: runtime: test mutex contention stacks and counts

gopherbot commented 5 months ago

Change https://go.dev/cl/586237 mentions this issue: runtime: improve runtime-internal mutex profile tests

gopherbot commented 5 months ago

Change https://go.dev/cl/587515 mentions this issue: runtime: lower mutex contention test expectations

gopherbot commented 5 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test ~ `TestRuntimeLockMetricsAndProfile`

2024-05-22 18:14 gotip-linux-amd64-goamd64v3 go@41884dcd runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1046: lock contention growth in runtime/pprof's view (0.033319s) metrics_test.go:1047: lock contention growth in runtime/metrics' view (0.007455s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.fastMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=48 value=7984994 metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.slowMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=52 value=25334387 metrics_test.go:1215: runtime/metrics reported less than the known minimum contention duration (0.007455s < 0.010000s) metrics_test.go:1262: profile's view of delays due to "slow" critical section: 25334387ns metrics_test.go:1263: test's view of known-contended time within "slow" critical section: 24811379ns metrics_test.go:1264: test's view of lock duration before critical sections other than "slow": 25426664ns metrics_test.go:1262: profile's view of delays due to "fast" critical section: 7984994ns metrics_test.go:1263: test's view of known-contended time within "fast" critical section: 7491917ns metrics_test.go:1264: test's view of lock duration before critical sections other than "fast": 8099926ns --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.04s)

2024-05-22 18:14 gotip-linux-amd64-longtest go@41884dcd runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1046: lock contention growth in runtime/pprof's view (0.028669s) metrics_test.go:1047: lock contention growth in runtime/metrics' view (0.008392s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.fastMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=54 value=8020739 metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.slowMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=46 value=20648595 metrics_test.go:1215: runtime/metrics reported less than the known minimum contention duration (0.008392s < 0.010000s) metrics_test.go:1262: profile's view of delays due to "slow" critical section: 20648595ns metrics_test.go:1263: test's view of known-contended time within "slow" critical section: 20288141ns metrics_test.go:1264: test's view of lock duration before critical sections other than "slow": 20687232ns metrics_test.go:1262: profile's view of delays due to "fast" critical section: 8020739ns metrics_test.go:1263: test's view of known-contended time within "fast" critical section: 7598003ns metrics_test.go:1264: test's view of lock duration before critical sections other than "fast": 8062144ns --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.03s)

2024-05-22 19:44 gotip-linux-amd64-longtest-race go@8d464e4a runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1046: lock contention growth in runtime/pprof's view (0.047102s) metrics_test.go:1047: lock contention growth in runtime/metrics' view (0.005752s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.fastMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=44 value=12326018 metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.slowMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=56 value=34775506 metrics_test.go:1215: runtime/metrics reported less than the known minimum contention duration (0.005752s < 0.010000s) metrics_test.go:1262: profile's view of delays due to "slow" critical section: 34775506ns metrics_test.go:1263: test's view of known-contended time within "slow" critical section: 32131162ns metrics_test.go:1264: test's view of lock duration before critical sections other than "slow": 36732688ns metrics_test.go:1262: profile's view of delays due to "fast" critical section: 12326018ns metrics_test.go:1263: test's view of known-contended time within "fast" critical section: 11846984ns metrics_test.go:1264: test's view of lock duration before critical sections other than "fast": 12833617ns --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.07s)

2024-05-22 20:12 gotip-linux-amd64-staticlockranking go@ef225d1c runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1046: lock contention growth in runtime/pprof's view (0.032466s) metrics_test.go:1047: lock contention growth in runtime/metrics' view (0.009398s) metrics_test.go:1095: stack [runtime.unlockWithRank runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.fastMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=53 value=8790615 metrics_test.go:1095: stack [runtime.unlockWithRank runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.slowMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=47 value=23675668 metrics_test.go:1215: runtime/metrics reported less than the known minimum contention duration (0.009398s < 0.010000s) metrics_test.go:1262: profile's view of delays due to "slow" critical section: 23675668ns metrics_test.go:1263: test's view of known-contended time within "slow" critical section: 22988673ns metrics_test.go:1264: test's view of lock duration before critical sections other than "slow": 23739300ns metrics_test.go:1262: profile's view of delays due to "fast" critical section: 8790615ns metrics_test.go:1263: test's view of known-contended time within "fast" critical section: 8260949ns metrics_test.go:1264: test's view of lock duration before critical sections other than "fast": 8905028ns --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.04s)

— watchflakes

gopherbot commented 5 months ago

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test ~ `TestRuntimeLockMetricsAndProfile`

2024-05-22 20:12 gotip-linux-ppc64le_power10 go@ef225d1c runtime.TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (log)

=== RUN TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 metrics_test.go:1046: lock contention growth in runtime/pprof's view (0.104076s) metrics_test.go:1047: lock contention growth in runtime/metrics' view (0.006614s) metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.fastMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=51 value=41269746 metrics_test.go:1095: stack [runtime.unlock runtime_test.TestRuntimeLockMetricsAndProfile.func4.1.1 runtime_test.slowMarkerFrame runtime_test.TestRuntimeLockMetricsAndProfile.func4.1 runtime_test.(*contentionWorker).run] has samples totaling n=49 value=62806075 metrics_test.go:1215: runtime/metrics reported less than the known minimum contention duration (0.006614s < 0.010000s) metrics_test.go:1262: profile's view of delays due to "slow" critical section: 62806075ns metrics_test.go:1263: test's view of known-contended time within "slow" critical section: 54621454ns metrics_test.go:1264: test's view of lock duration before critical sections other than "slow": 62913899ns metrics_test.go:1262: profile's view of delays due to "fast" critical section: 41269746ns metrics_test.go:1263: test's view of known-contended time within "fast" critical section: 16070074ns metrics_test.go:1264: test's view of lock duration before critical sections other than "fast": 41288186ns --- FAIL: TestRuntimeLockMetricsAndProfile/runtime.lock/sample-1 (0.12s)

— watchflakes

golang / go

runtime: TestRuntimeLockMetricsAndProfile failures #64253