Open Haroon-Khel opened 2 years ago
Deep history shows this test passes occasionally. Can we verify the status of the jdk_jfr tests from the October release on this platform? If we hit the same issues with last release (and reran to see it passing), we will do the same with this release. Try grinding on the machine its shown to pass in the deep history view (test-docker-ubuntu2010-x64-3). If passes, what is special about that machine in relation to what this test is doing (look at test code). If fails, run with ITERATIONS=10 to see what is the pass rate.
We need to take extra action to try and figure out root cause (test / infra / product) and raise an issue in the appropriate location.
If test issue: We should determine if this is a 'flaky' test (JBS report for test material issue)
If infra issue: one that needs a particular machine setup and raise/fix infra requirements (adoptium/infrastructure issue).
If intermittent product issue: JBS report for JFR issue
Rerunning on the machine it last passed on, test-docker-ubuntu2010-x64-3, https://ci.adoptopenjdk.net/job/Grinder/3263/console
@Haroon-Khel @smlambert
The deep history log provides a dump when the timeout occurs: https://trss.adoptium.net/output/test?id=61e848cba7df1a2924028e9a
TestCompilerCompiles has:
[2022-01-19T14:46:27.450Z] "MainThread" #9 prio=5 os_prio=0 tid=0x00007f0e2c12a800 nid=0x158efa waiting on condition [0x00007f0df70ee000]
[2022-01-19T14:46:27.450Z] java.lang.Thread.State: TIMED_WAITING (sleeping)
[2022-01-19T14:46:27.450Z] at java.lang.Thread.sleep(Native Method)
[2022-01-19T14:46:27.450Z] at jdk.test.lib.Utils.waitForCondition(Utils.java:530)
[2022-01-19T14:46:27.450Z] at jdk.test.lib.Utils.waitForCondition(Utils.java:499)
[2022-01-19T14:46:27.450Z] at jdk.jfr.event.compiler.TestCompilerCompile.doTest(TestCompilerCompile.java:85)
[...]
[2022-01-19T14:46:27.450Z] Warning: 'NoSuchMethodError' on register of sun.hotspot.WhiteBox::NMTNewArena(J)J
[2022-01-19T14:46:27.450Z] Warning: 'NoSuchMethodError' on register of sun.hotspot.WhiteBox::NMTFreeArena(J)V
[2022-01-19T14:46:27.450Z] Warning: 'NoSuchMethodError' on register of sun.hotspot.WhiteBox::NMTArenaMalloc(JJ)V
which is the test waiting for the request to compile the method via WhiteBox.
TestShutdownEvent has:
[2022-01-19T14:54:31.488Z] "MainThread" #9 prio=5 os_prio=0 tid=0x00007fd7dc12a000 nid=0x159ece in Object.wait() [0x00007fd787db3000]
[2022-01-19T14:54:31.488Z] java.lang.Thread.State: WAITING (on object monitor)
[2022-01-19T14:54:31.488Z] at java.lang.Object.wait(Native Method)
[2022-01-19T14:54:31.488Z] - waiting on <0x00007fd7c309e3a0> (a java.lang.UNIXProcess)
[2022-01-19T14:54:31.488Z] at java.lang.Object.wait(Object.java:502)
[2022-01-19T14:54:31.488Z] at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
[2022-01-19T14:54:31.488Z] - locked <0x00007fd7c309e3a0> (a java.lang.UNIXProcess)
[2022-01-19T14:54:31.488Z] at jdk.test.lib.process.ProcessTools.getOutput(ProcessTools.java:98)
[2022-01-19T14:54:31.488Z] at jdk.test.lib.process.OutputAnalyzer.<init>(OutputAnalyzer.java:48)
[2022-01-19T14:54:31.488Z] at jdk.test.lib.process.ProcessTools.executeProcess(ProcessTools.java:412)
[2022-01-19T14:54:31.488Z] at jdk.jfr.event.runtime.TestShutdownEvent.runSubtest(TestShutdownEvent.java:82)
where it's waiting on the code to actually start up the subprocess. However the stdout has:
[2022-01-19T14:54:31.493Z] ...
[2022-01-19T14:54:31.493Z] Output overflow:
[2022-01-19T14:54:31.493Z] JT Harness has limited the test output to the text
[2022-01-19T14:54:31.493Z] at the beginning and the end, so that you can see how the
[2022-01-19T14:54:31.493Z] test began, and how it completed.
[2022-01-19T14:54:31.493Z]
[2022-01-19T14:54:31.493Z] If you need to see more of the output from the test,
[2022-01-19T14:54:31.493Z] set the system property javatest.maxOutputSize to a higher
[2022-01-19T14:54:31.493Z] value. The current value is 100000
[2022-01-19T14:54:31.493Z] ...
So we can't see exactly which subprocess was stuck.
I'd consider them both test/infra issues rather than product issues.
Thanks for this assessment @jiekang - based on it and the fact that this test passes intermittently on some machines (test-docker-ubuntu2010-x64-3 has both passed and failed before which also potentially indicates infra/test issues), we will not block the release on these failures, but indeed we should figure out which subprocess is stuck and why...
@Haroon-Khel - could it be the case that there are some processes laying around that then block some of this tests subprocesses from running? and when we do a cleanup of the machines (static docker containers) we see 1 passing run before the machine is back to a state where this test fails again.
could it be the case that there are some processes laying around that then block some of this tests subprocesses from running? and when we do a cleanup of the machines (static docker containers) we see 1 passing run before the machine is back to a state where this test fails again.
I'll take a look
On test-docker-ubuntu2010-x64-3 there was a rogue Test_openjdk17_hs_extended.openjdk_x86-64_linux_testList_1 process from jan 18 still running.
Also, I noticed that test-docker-ubuntu2010-x64-3 is actually test-docker-fedora35-x64-1; I replaced test-docker-ubuntu2010-x64-3 with test-docker-ubuntu2110-x64-1 a few weeks back, and I must have forgotten to remove the jenkins node for test-docker-ubuntu2010-x64-3, and then when setting up test-docker-fedora35-x64-1 I gave it the same port on the dockerhost. That may explain why the test no longer passes on test-docker-ubuntu2010-x64-3, and so does hint towards this issue being a machine specific problem
Anyway, rerunning on test-docker-fedora35-x64-1 https://ci.adoptopenjdk.net/job/Grinder/3291/console
The following tests from jdk_jfr in the extended openjdk test suite fail on xLinux
https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_linux_testList_1/30/testReport/
jdk/jfr/event/compiler/TestCompilerCompile.java.TestCompilerCompile
jdk/jfr/event/runtime/TestShutdownEvent.java.TestShutdownEvent
jdk/jfr/event/runtime/TestShutdownEvent.java.TestShutdownEvent has been excluded on aarch64 linux but not yet for xLinux https://github.com/adoptium/aqa-tests/pull/3007