eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

OpenJDK util/stream/CountLargeTest.java timeout on alinux new OSU machine #15198

Closed pshipton closed 2 years ago

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/204 - ub20-aarch64-osu-1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/204/openjdk_test_output.tar.gz cores in the following locations, no javacore which would be useful. aqa-tests\TKG\output_16541328857881\jdk_util_0\work\java\util\stream\test\org\openjdk\tests\java\util\stream\CountLargeTest aqa-tests\TKG\output_16541328857881\jdk_util_1\work\java\util\stream\test\org\openjdk\tests\java\util\stream\CountLargeTest

jdk_util_0 -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:+UseCompressedOops java/util/stream/test/org/openjdk/tests/java/util/stream/CountLargeTest.java

23:04:36  ACTION: testng -- Error. Program `/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 969088ms).
23:04:36  REASON: Assumed action based on file name: run testng org.openjdk.tests.java.util.stream.CountLargeTest 
23:04:36  TIME:   969.094 seconds
23:04:36  messages:
23:04:36  command: testng org.openjdk.tests.java.util.stream.CountLargeTest
23:04:36  reason: Assumed action based on file name: run testng org.openjdk.tests.java.util.stream.CountLargeTest 
23:04:36  Mode: othervm [test requires non-default system module, test or library overrides a system module]
23:04:36  Additional options from @modules: --add-modules jdk.incubator.foreign
23:04:36  Timeout information:
23:04:36  Running jcmd on process 1556916
23:04:36  Dump written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/aqa-tests/TKG/output_16541328857881/jdk_util_0/work/java/util/stream/test/org/openjdk/tests/java/util/stream/CountLargeTest/core.20220602.030415.1556916.0001.dmp
23:04:36  Running jstack on process 1556916
23:04:36  2022-06-02T03:04:22.508508419
23:04:36  Virtual machine: 1556916 JVM information:
23:04:36  JRE 17 Linux aarch64-64-Bit Compressed References 20220601_215 (JIT enabled, AOT enabled)
23:04:36  OpenJ9   - 792a28f15e3
23:04:36  OMR      - f6cc0533cf7
23:04:36  JCL      - e6d04ddfc94 based on jdk-17.0.4+3
23:04:36  
23:04:36  "main" prio=5 Id=1 WAITING
23:04:36    at java.base@17.0.4-internal/java.lang.Object.wait(Native Method)
23:04:36    at java.base@17.0.4-internal/java.lang.Object.wait(Object.java:219)
23:04:36    at java.base@17.0.4-internal/java.lang.Thread.join(Thread.java:780)
23:04:36    at java.base@17.0.4-internal/java.lang.Thread.join(Thread.java:735)
23:04:36    at app//com.sun.javatest.regtest.agent.MainWrapper.main(MainWrapper.java:74)
23:04:36  
23:04:36  "JIT Compilation Thread-000" prio=10 Id=3 RUNNABLE
23:04:36  
23:04:36  "JIT Compilation Thread-001 Suspended" prio=10 Id=4 RUNNABLE
23:04:36  
23:04:36  "JIT Compilation Thread-002 Suspended" prio=10 Id=5 RUNNABLE
23:04:36  
23:04:36  "JIT Compilation Thread-003 Suspended" prio=10 Id=6 RUNNABLE
23:04:36  
23:04:36  "JIT Compilation Thread-004 Suspended" prio=10 Id=7 RUNNABLE
23:04:36  
23:04:36  "JIT Compilation Thread-005 Suspended" prio=10 Id=8 RUNNABLE
23:04:36  
23:04:36  "JIT Compilation Thread-006 Suspended" prio=10 Id=9 RUNNABLE
23:04:36  
23:04:36  "JIT Diagnostic Compilation Thread-007 Suspended" prio=10 Id=10 RUNNABLE
23:04:36  
23:04:36  "JIT-SamplerThread" prio=10 Id=11 TIMED_WAITING
23:04:36  
23:04:36  "IProfiler" prio=5 Id=12 RUNNABLE
23:04:36  
23:04:36  "Common-Cleaner" prio=8 Id=2 TIMED_WAITING
23:04:36    at java.base@17.0.4-internal/java.lang.Object.wait(Native Method)
23:04:36    at java.base@17.0.4-internal/java.lang.Object.wait(Object.java:219)
23:04:36    at java.base@17.0.4-internal/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:138)
23:04:36    at java.base@17.0.4-internal/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:140)
23:04:36    at java.base@17.0.4-internal/java.lang.Thread.run(Thread.java:858)
23:04:36    at java.base@17.0.4-internal/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:162)
23:04:36  
23:04:36  "Finalizer thread" prio=5 Id=13 RUNNABLE
23:04:36  
23:04:36  "Concurrent Mark Helper" prio=1 Id=14 RUNNABLE
23:04:36  
23:04:36  "GC Worker" prio=5 Id=15 RUNNABLE
23:04:36  
23:04:36  "GC Worker" prio=5 Id=16 RUNNABLE
23:04:36  
23:04:36  "GC Worker" prio=5 Id=17 RUNNABLE
23:04:36  
23:04:36  "Attach API wait loop" prio=10 Id=20 RUNNABLE
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.IPC.waitSemaphore(Native Method)
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.CommonDirectory.waitSemaphore(CommonDirectory.java:264)
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.WaitLoop.waitForNotification(WaitLoop.java:66)
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.WaitLoop.run(WaitLoop.java:157)
23:04:36  
23:04:36  "MainThread" prio=5 Id=22 RUNNABLE
23:04:36    at java.base@17.0.4-internal/java.util.stream.Streams$RangeLongSpliterator.forEachRemaining(Streams.java:233)
23:04:36    at java.base@17.0.4-internal/java.util.Spliterator$OfLong.forEachRemaining(Spliterator.java:775)
23:04:36    at java.base@17.0.4-internal/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:522)
23:04:36    at java.base@17.0.4-internal/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:512)
23:04:36    at java.base@17.0.4-internal/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
23:04:36    at java.base@17.0.4-internal/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:258)
23:04:36    at java.base@17.0.4-internal/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:248)
23:04:36    at java.base@17.0.4-internal/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239)
23:04:36    at java.base@17.0.4-internal/java.util.stream.ReferencePipeline.count(ReferencePipeline.java:709)
23:04:36    at app//org.openjdk.tests.java.util.stream.CountLargeTest.testRefLarge(CountLargeTest.java:53)
23:04:36    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
23:04:36    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
23:04:36    at java.base@17.0.4-internal/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
23:04:36    at java.base@17.0.4-internal/java.lang.reflect.Method.invoke(Method.java:568)
23:04:36    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
23:04:36    at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
23:04:36    at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
23:04:36    at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
23:04:36    at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
23:04:36    at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
23:04:36    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
23:04:36    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
23:04:36    at java.base@17.0.4-internal/java.util.ArrayList.forEach(ArrayList.java:1511)
23:04:36    at org.testng.TestRunner.privateRun(TestRunner.java:764)
23:04:36    at org.testng.TestRunner.run(TestRunner.java:585)
23:04:36    at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
23:04:36    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
23:04:36    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
23:04:36    at org.testng.SuiteRunner.run(SuiteRunner.java:286)
23:04:36    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
23:04:36    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
23:04:36    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
23:04:36    at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
23:04:36    at org.testng.TestNG.runSuites(TestNG.java:1069)
23:04:36    at org.testng.TestNG.run(TestNG.java:1037)
23:04:36    at app//com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94)
23:04:36    at app//com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54)
23:04:36    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
23:04:36    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
23:04:36    at java.base@17.0.4-internal/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
23:04:36    at java.base@17.0.4-internal/java.lang.reflect.Method.invoke(Method.java:568)
23:04:36    at app//com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
23:04:36    at java.base@17.0.4-internal/java.lang.Thread.run(Thread.java:858)
23:04:36  
23:04:36  "file lock watchdog" prio=10 Id=24 TIMED_WAITING
23:04:36    at java.base@17.0.4-internal/java.lang.Object.wait(Native Method)
23:04:36    at java.base@17.0.4-internal/java.lang.Object.wait(Object.java:219)
23:04:36    at java.base@17.0.4-internal/java.util.TimerThread.mainLoop(Timer.java:563)
23:04:36    at java.base@17.0.4-internal/java.util.TimerThread.run(Timer.java:516)
23:04:36  
23:04:36  "Attachment portNumber: 46289" prio=10 Id=25 RUNNABLE
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.DiagnosticUtils.dumpAllThreadsImpl(Native Method)
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.DiagnosticUtils.getThreadInfo(DiagnosticUtils.java:245)
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.DiagnosticUtils.executeDiagnosticCommand(DiagnosticUtils.java:181)
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.Attachment.doCommand(Attachment.java:248)
23:04:36    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.Attachment.run(Attachment.java:159)
23:04:36  
23:04:36  
23:04:36  --- Timeout information end.
23:04:36  elapsed time (seconds): 969.094
23:04:36  configuration:
23:04:36  Boot Layer
23:04:36    add modules: jdk.incubator.foreign 
23:04:36    add reads:   java.base ALL-UNNAMED
23:04:36    patch:       java.base /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/aqa-tests/TKG/output_16541328857881/jdk_util_0/work/classes/0/lib/testlibrary/bootlib/java.base
23:04:36  
23:04:36  STDOUT:
23:04:36  test org.openjdk.tests.java.util.stream.CountLargeTest.testDoubleLarge(): success
23:04:36  test org.openjdk.tests.java.util.stream.CountLargeTest.testIntLarge(): success
23:04:36  test org.openjdk.tests.java.util.stream.CountLargeTest.testLongLarge(): success
23:04:36  Timeout refired 960 times

jdk_util_1 -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-UseCompressedOops java/util/stream/test/org/openjdk/tests/java/util/stream/CountLargeTest.java

23:51:09  ACTION: testng -- Error. Program `/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 973041ms).
23:51:09  REASON: Assumed action based on file name: run testng org.openjdk.tests.java.util.stream.CountLargeTest 
23:51:09  TIME:   973.06 seconds
23:51:09  messages:
23:51:09  command: testng org.openjdk.tests.java.util.stream.CountLargeTest
23:51:09  reason: Assumed action based on file name: run testng org.openjdk.tests.java.util.stream.CountLargeTest 
23:51:09  Mode: othervm [test requires non-default system module, test or library overrides a system module]
23:51:09  Additional options from @modules: --add-modules jdk.incubator.foreign
23:51:09  Timeout information:
23:51:09  Running jcmd on process 1651888
23:51:09  Dump written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/aqa-tests/TKG/output_16541328857881/jdk_util_1/work/java/util/stream/test/org/openjdk/tests/java/util/stream/CountLargeTest/core.20220602.035054.1651888.0001.dmp
23:51:09  Running jstack on process 1651888
23:51:09  2022-06-02T03:51:03.238337941
23:51:09  Virtual machine: 1651888 JVM information:
23:51:09  JRE 17 Linux aarch64-64-Bit 20220601_215 (JIT enabled, AOT enabled)
23:51:09  OpenJ9   - 792a28f15e3
23:51:09  OMR      - f6cc0533cf7
23:51:09  JCL      - e6d04ddfc94 based on jdk-17.0.4+3
23:51:09  
23:51:09  "main" prio=5 Id=1 WAITING
23:51:09    at java.base@17.0.4-internal/java.lang.Object.wait(Native Method)
23:51:09    at java.base@17.0.4-internal/java.lang.Object.wait(Object.java:219)
23:51:09    at java.base@17.0.4-internal/java.lang.Thread.join(Thread.java:780)
23:51:09    at java.base@17.0.4-internal/java.lang.Thread.join(Thread.java:735)
23:51:09    at app//com.sun.javatest.regtest.agent.MainWrapper.main(MainWrapper.java:74)
23:51:09  
23:51:09  "JIT Compilation Thread-000" prio=10 Id=3 RUNNABLE
23:51:09  
23:51:09  "JIT Compilation Thread-001 Suspended" prio=10 Id=4 RUNNABLE
23:51:09  
23:51:09  "JIT Compilation Thread-002 Suspended" prio=10 Id=5 RUNNABLE
23:51:09  
23:51:09  "JIT Compilation Thread-003 Suspended" prio=10 Id=6 RUNNABLE
23:51:09  
23:51:09  "JIT Compilation Thread-004 Suspended" prio=10 Id=7 RUNNABLE
23:51:09  
23:51:09  "JIT Compilation Thread-005 Suspended" prio=10 Id=8 RUNNABLE
23:51:09  
23:51:09  "JIT Compilation Thread-006 Suspended" prio=10 Id=9 RUNNABLE
23:51:09  
23:51:09  "JIT Diagnostic Compilation Thread-007 Suspended" prio=10 Id=10 RUNNABLE
23:51:09  
23:51:09  "JIT-SamplerThread" prio=10 Id=11 TIMED_WAITING
23:51:09  
23:51:09  "IProfiler" prio=5 Id=12 RUNNABLE
23:51:09  
23:51:09  "Common-Cleaner" prio=8 Id=2 TIMED_WAITING
23:51:09    at java.base@17.0.4-internal/java.lang.Object.wait(Native Method)
23:51:09    at java.base@17.0.4-internal/java.lang.Object.wait(Object.java:219)
23:51:09    at java.base@17.0.4-internal/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:138)
23:51:09    at java.base@17.0.4-internal/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:140)
23:51:09    at java.base@17.0.4-internal/java.lang.Thread.run(Thread.java:858)
23:51:09    at java.base@17.0.4-internal/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:162)
23:51:09  
23:51:09  "Concurrent Mark Helper" prio=1 Id=13 RUNNABLE
23:51:09  
23:51:09  "Finalizer thread" prio=5 Id=14 RUNNABLE
23:51:09  
23:51:09  "GC Worker" prio=5 Id=15 RUNNABLE
23:51:09  
23:51:09  "GC Worker" prio=5 Id=16 RUNNABLE
23:51:09  
23:51:09  "GC Worker" prio=5 Id=17 RUNNABLE
23:51:09  
23:51:09  "Attach API wait loop" prio=10 Id=20 RUNNABLE
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.IPC.waitSemaphore(Native Method)
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.CommonDirectory.waitSemaphore(CommonDirectory.java:264)
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.WaitLoop.waitForNotification(WaitLoop.java:66)
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.WaitLoop.run(WaitLoop.java:157)
23:51:09  
23:51:09  "MainThread" prio=5 Id=22 RUNNABLE
23:51:09    at java.base@17.0.4-internal/java.util.stream.Streams$RangeLongSpliterator.forEachRemaining(Streams.java:233)
23:51:09    at java.base@17.0.4-internal/java.util.Spliterator$OfLong.forEachRemaining(Spliterator.java:775)
23:51:09    at java.base@17.0.4-internal/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:522)
23:51:09    at java.base@17.0.4-internal/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:512)
23:51:09    at java.base@17.0.4-internal/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
23:51:09    at java.base@17.0.4-internal/java.util.stream.ReduceOps$13.evaluateSequential(ReduceOps.java:610)
23:51:09    at java.base@17.0.4-internal/java.util.stream.ReduceOps$13.evaluateSequential(ReduceOps.java:600)
23:51:09    at java.base@17.0.4-internal/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239)
23:51:09    at java.base@17.0.4-internal/java.util.stream.LongPipeline.count(LongPipeline.java:487)
23:51:09    at app//org.openjdk.tests.java.util.stream.CountLargeTest.testLongLarge(CountLargeTest.java:83)
23:51:09    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
23:51:09    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
23:51:09    at java.base@17.0.4-internal/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
23:51:09    at java.base@17.0.4-internal/java.lang.reflect.Method.invoke(Method.java:568)
23:51:09    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
23:51:09    at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599)
23:51:09    at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174)
23:51:09    at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46)
23:51:09    at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822)
23:51:09    at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147)
23:51:09    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
23:51:09    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128)
23:51:09    at java.base@17.0.4-internal/java.util.ArrayList.forEach(ArrayList.java:1511)
23:51:09    at org.testng.TestRunner.privateRun(TestRunner.java:764)
23:51:09    at org.testng.TestRunner.run(TestRunner.java:585)
23:51:09    at org.testng.SuiteRunner.runTest(SuiteRunner.java:384)
23:51:09    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378)
23:51:09    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337)
23:51:09    at org.testng.SuiteRunner.run(SuiteRunner.java:286)
23:51:09    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53)
23:51:09    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96)
23:51:09    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218)
23:51:09    at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
23:51:09    at org.testng.TestNG.runSuites(TestNG.java:1069)
23:51:09    at org.testng.TestNG.run(TestNG.java:1037)
23:51:09    at app//com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:94)
23:51:09    at app//com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:54)
23:51:09    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
23:51:09    at java.base@17.0.4-internal/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
23:51:09    at java.base@17.0.4-internal/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
23:51:09    at java.base@17.0.4-internal/java.lang.reflect.Method.invoke(Method.java:568)
23:51:09    at app//com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
23:51:09    at java.base@17.0.4-internal/java.lang.Thread.run(Thread.java:858)
23:51:09  
23:51:09  "file lock watchdog" prio=10 Id=24 TIMED_WAITING
23:51:09    at java.base@17.0.4-internal/java.lang.Object.wait(Native Method)
23:51:09    at java.base@17.0.4-internal/java.lang.Object.wait(Object.java:219)
23:51:09    at java.base@17.0.4-internal/java.util.TimerThread.mainLoop(Timer.java:563)
23:51:09    at java.base@17.0.4-internal/java.util.TimerThread.run(Timer.java:516)
23:51:09  
23:51:09  "Attachment portNumber: 36899" prio=10 Id=26 RUNNABLE
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.DiagnosticUtils.dumpAllThreadsImpl(Native Method)
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.DiagnosticUtils.getThreadInfo(DiagnosticUtils.java:245)
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.DiagnosticUtils.executeDiagnosticCommand(DiagnosticUtils.java:181)
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.Attachment.doCommand(Attachment.java:248)
23:51:09    at java.base@17.0.4-internal/openj9.internal.tools.attach.target.Attachment.run(Attachment.java:159)
23:51:09  
23:51:09  
23:51:09  --- Timeout information end.
23:51:09  elapsed time (seconds): 973.06
23:51:09  configuration:
23:51:09  Boot Layer
23:51:09    add modules: jdk.incubator.foreign 
23:51:09    add reads:   java.base ALL-UNNAMED
23:51:09    patch:       java.base /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/aqa-tests/TKG/output_16541328857881/jdk_util_1/work/classes/1/lib/testlibrary/bootlib/java.base
23:51:09  
23:51:09  STDOUT:
23:51:09  test org.openjdk.tests.java.util.stream.CountLargeTest.testDoubleLarge(): success
23:51:09  test org.openjdk.tests.java.util.stream.CountLargeTest.testIntLarge(): success
23:51:09  Timeout refired 960 times
pshipton commented 2 years ago

Looking at jdk_util_0

!nativememinfo
JRE: 2,492,863,856 bytes / 6029 allocations
|  
+--VM: 2,199,765,216 bytes / 5582 allocations
|  |  
|  +--Classes: 331,699,056 bytes / 598 allocations
|  |  |  
|  |  +--Shared Class Cache: 314,572,896 bytes / 2 allocations
|  |  |  
|  |  +--Other: 17,126,160 bytes / 596 allocations
|  |  
|  +--Modules: 537,968 bytes / 2167 allocations
|  |  
|  +--Memory Manager (GC): 1,647,571,392 bytes / 495 allocations
|  |  |  
|  |  +--Java Heap: 1,614,868,480 bytes / 1 allocation

!dumpallregions
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
|    region      |     start      |      end       |    subspace    | flags  |      size      |      region type     
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
 0000ffffb40a7910 000000009fc00000 00000000a0200000 0000ffffb4095430 00000009           600000 ADDRESS_ORDERED
 0000ffffb40a7580 00000000ffda0000 00000000fff50000 0000ffffb40a1490 0000000a           1b0000 ADDRESS_ORDERED
 0000ffffb40a71f0 00000000fff50000 0000000100000000 0000ffffb409ba20 0000000a            b0000 ADDRESS_ORDERED
+----------------+----------------+----------------+----------------+--------+----------------+----------------------

Doesn't seem like excessive GCing, the heap hasn't been expanded much:


03:04:15.210075000  0x00000000001bdb00 j9mm.715            Event       Thread:  1 -> timeToStartCollection:  2314  scanStall:     0  syncStall:   444  notifyStall:    51
03:04:15.210077000  0x00000000001bdb00 j9mm.545            Event       Scav    1: stall_work=   0ms stall_complete=   0ms stall_sync=   0ms stall_count=0/0/3 free_lists=0/0 scan_lists=2/0
03:04:15.210079000 *0x00000000001c0000 j9mm.715            Event       Thread:  2 -> timeToStartCollection:   435  scanStall:   401  syncStall:   374  notifyStall:     0
03:04:15.210080000  0x00000000001c0000 j9mm.545            Event       Scav    2: stall_work=   0ms stall_complete=   0ms stall_sync=   0ms stall_count=0/2/3 free_lists=5/3 scan_lists=3/2
03:04:15.210244000 *0x0000000000000000 j9mm.140            Event       Tilt ratio: 65
03:04:15.210415000  0x0000000000000000 j9mm.560            Event       LocalGC end: rememberedsetoverflow=0 causedrememberedsetoverflow=0 scancacheoverflow=0 failedflipcount=0 failedflipbytes=0 failedtenurecount=0 failedtenurebytes=0 flipcount=3183 flipbytes=125600 newspace=1459200/2490368 oldspace=4279896/6291456 loa=315392/315392 tenureage=0
03:04:15.210883000  0x0000000000000000 j9mm.468            Event       Cycle End: type 2 approximateFreeMemorySize 5739096
03:04:15.210902000  0x0000000000000000 j9mm.470            Event       Allocation failure cycle end: newspace=1459200/2490368 oldspace=4279896/6291456 loa=315392/315392
03:04:15.210921000  0x0000000000000000 j9mm.134            Event       Allocation failure end: newspace=1457136/2490368 oldspace=4279896/6291456 loa=3```
pshipton commented 2 years ago

The new OSU machines are much slower than the old ones, Adam pointed out the compile time went from 18m to 54m.

pshipton commented 2 years ago

Fails on jdk18 as well. We aren't testing this platform ATM with jdk19. https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Nightly/73/ - ub20-aarch64-osu-2

pshipton commented 2 years ago

Created https://github.com/ibmruntimes/openj9-openjdk-jdk/pull/450 to capture a javacore in addition to the core we already get on timeout.

pshipton commented 2 years ago

In case this isn't machine related, the changes from the previously working build. https://github.com/eclipse-openj9/openj9/compare/eeedfa4d2c4...792a28f15e3 - nothing relevant https://github.com/eclipse-openj9/openj9-omr/compare/e542d083538...f6cc0533cf7 - aarch vector changes here, not sure they are relevent

@knn-k

pshipton commented 2 years ago

There are also some timeouts in internal testing that pre-date the code changes in the previous comment. https://github.com/eclipse-openj9/openj9/issues/15164

pshipton commented 2 years ago

I wonder if this is a real problem that only shows up on slower machines. The internal builds were broken for over a week, so we may not have seen when the problem was first introduced. The test isn't new and I don't see it recently un-excluded, it should have been running and passing for some time.

pshipton commented 2 years ago

that only shows up on slower machines

On slower machines or different processors.

knn-k commented 2 years ago

This testcase was excluded in April 2020 because of timeouts. I enabled the testcase again August last year. https://github.com/eclipse-openj9/openj9/issues/9040

I am not surprised if this testcase fails with a timeout on a slower machine.

pshipton commented 2 years ago

While the switch to slower machines can explain why it started failing last night externally, that doesn't explain why we just started seeing it internally (https://github.com/eclipse-openj9/openj9/issues/15164) when it's been working for many months.

pshipton commented 2 years ago

With the updated timeout handler we get a javacore. I won't keep reporting the external machine failures, but the javacore for the timeout tonight does show a lot of time spent in the "MainThread" (and nowhere else) as could be expected.

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/205/

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/205/openjdk_test_output.tar.gz

3XMTHREADINFO      "MainThread" J9VMThread:0x00000000001CCA00, omrthread_t:0x0000FFFF94355D80, java/lang/Thread:0x000000009FC8E980, state:R, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0x16, isDaemon:false)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x000000009FC51250)
3XMTHREADINFO1            (native thread ID:0x244EEC, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
3XMTHREADINFO2            (native stack address range from:0x0000FFFF7859E000, to:0x0000FFFF785DE000, size:0x40000)
3XMCPUTIME               CPU usage total: 952.504055050 secs, current category="Application"
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/util/stream/Streams$RangeLongSpliterator.forEachRemaining(Streams.java:233(Compiled Code))
4XESTACKTRACE                at java/util/Spliterator$OfLong.forEachRemaining(Spliterator.java:775)
4XESTACKTRACE                at java/util/stream/AbstractPipeline.copyInto(AbstractPipeline.java:522)
4XESTACKTRACE                at java/util/stream/AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:512)
4XESTACKTRACE                at java/util/stream/ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
4XESTACKTRACE                at java/util/stream/ReduceOps$5.evaluateSequential(ReduceOps.java:258)
4XESTACKTRACE                at java/util/stream/ReduceOps$5.evaluateSequential(ReduceOps.java:248)
4XESTACKTRACE                at java/util/stream/AbstractPipeline.evaluate(AbstractPipeline.java:239(Compiled Code))
4XESTACKTRACE                at java/util/stream/ReferencePipeline.count(ReferencePipeline.java:709)
4XESTACKTRACE                at org/openjdk/tests/java/util/stream/CountLargeTest.testRefLarge(CountLargeTest.java:53)
1XMTHDSUMMARY  Threads CPU Usage Summary
NULL           =========================
NULL
1XMTHDCATINFO  Warning: to get more accurate CPU times for the GC, the option -XX:-ReduceCPUMonitorOverhead can be used. See the user guide for more information.
NULL
1XMTHDCATEGORY All JVM attached threads: 959.993005000 secs
1XMTHDCATEGORY |
2XMTHDCATEGORY +--System-JVM: 7.032051000 secs
2XMTHDCATEGORY |  |
3XMTHDCATEGORY |  +--GC: 0.163603000 secs
2XMTHDCATEGORY |  |
3XMTHDCATEGORY |  +--JIT: 6.241224000 secs
1XMTHDCATEGORY |
2XMTHDCATEGORY +--Application: 952.960954000 secs
pshipton commented 2 years ago

For comparison I ran the test on xlinux (internal fyre machine), and it took about 2m to run.

11.0.15

  jdk_custom_0 Start Time: Thu Jun  2 20:45:18 2022 Epoch Time (ms): 1654227918061
jdk_custom_0 Finish Time: Thu Jun  2 20:47:07 2022 Epoch Time (ms): 1654228027144

Last nightly build

  jdk_custom_0 Start Time: Thu Jun  2 20:51:47 2022 Epoch Time (ms): 1654228307810
jdk_custom_0 Finish Time: Thu Jun  2 20:53:47 2022 Epoch Time (ms): 1654228427413
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 94
model name  : Intel Core Processor (Skylake, IBRS)
stepping    : 3
microcode   : 0x1
cpu MHz     : 2199.996
cache size  : 4096 KB
physical id : 0
siblings    : 1
core id     : 0
cpu cores   : 1
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm cr
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips    : 4399.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
pshipton commented 2 years ago

All the internal aarch64 machines are busy ATM. If I find one idle later I can try a similar manual test on this platform.

knn-k commented 2 years ago

CountLargeTest took 20 minutes to finish in the previous run on cent8-aarch64-3: https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/203/testReport/

The number has been similar in older jobs -- 19 minutes on cent8-aarch64-2 in April: https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/175/testReport/

knn-k commented 2 years ago

aarch vector changes here, not sure they are relevent

Those PRs added AArch64 vector instructions, but they are not used anywhere yet.

knn-k commented 2 years ago

I ran the testcase in my local environment using three builds, and I see no particular performance degradation. See the results below.

May 31 nightly build (https://openj9-jenkins.osuosl.org/job/Build_JDK17_aarch64_linux_Nightly/214/) 3 min 40 sec

jdk_custom_0 Start Time: Fri Jun  3 16:19:03 2022 Epoch Time (ms): 1654240743393
jdk_custom_0 Finish Time: Fri Jun  3 16:22:43 2022 Epoch Time (ms): 1654240963064

June 1 nightly build (https://openj9-jenkins.osuosl.org/job/Build_JDK17_aarch64_linux_Nightly/215/) 3 min 41 sec

jdk_custom_0 Start Time: Fri Jun  3 16:24:35 2022 Epoch Time (ms): 1654241075833
jdk_custom_0 Finish Time: Fri Jun  3 16:28:16 2022 Epoch Time (ms): 1654241296616

JDK 17 v0.32.0 (https://github.com/ibmruntimes/semeru17-binaries/releases/tag/jdk-17.0.3%2B7_openj9-0.32.0) 3 min 46 sec

jdk_custom_0 Start Time: Fri Jun  3 16:29:39 2022 Epoch Time (ms): 1654241379494
jdk_custom_0 Finish Time: Fri Jun  3 16:33:25 2022 Epoch Time (ms): 1654241605395
pshipton commented 2 years ago

Thanks. The 20m is for both variants of the test, which explains why it didn't timeout if each run is 10m each. I thought our regular machines were fast, but I guess your local test environment is much faster.

I still don't know why we got the timeouts internally. Is it possible that sometimes the JIT just doesn't optimize it as much?

the jdk19 build that timed out

job/Test_openjdknext_j9_sanity.openjdk_aarch64_linux_Personal/2 machine time
same ub18-aarch64-4 16 and 11 (27 - 16)

the 0.23 M1 build that timed out

job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/199 machine time
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/126 cent7-aarch64-4 16
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/126 ub18-aarch64-9 16

There was a grinder on the M1 build that didn't reproduce it (20 iterations each) ** same machine that timed out earlier

job machine time avg
job/Grinder_iteration_0/1229 ub18-aarch64-8 3h 29m 10.45
job/Grinder_iteration_1/1116 ub18-aarch64-2 3h 25m 10.25
job/Grinder_iteration_2/1038 cent7-aarch64-4** 3h 18m 9.9
job/Grinder_iteration_3/946 ub18-aarch64-5 3h 19m 9.95
job/Grinder_iteration_4/903 cent7-aarch64-6 3h 27h 10.35

Recent runs

job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/198 machine time
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/125 ub18-aarch64-9 13
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/125 ub18-aarch64-7 12
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/200 machine time
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/127 cent7-aarch64-5 10
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/127 ub18-aarch64-2 12
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux/201 machine time
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_0/128 cent7-aarch64-2 13
job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_testList_1/128 cent7-aarch64-3 13
pshipton commented 2 years ago

Another interesting question is why is this test only slow on jdk17+? Optimization problems with OpenJDK MH? It's the same test, running and passing on jdk11 in about 10m, while timing out on jdk17+ at 16m.

knn-k commented 2 years ago

Yes, it is much faster with JDK 11 than with JDK 17 in my local environment.

June 1 JDK 11 nightly build (https://openj9-jenkins.osuosl.org/job/Build_JDK11_aarch64_linux_Nightly/182/) 1 min 42 sec

jdk_custom_0 Start Time: Mon Jun  6 09:36:01 2022 Epoch Time (ms): 1654475761216
jdk_custom_0 Finish Time: Mon Jun  6 09:37:43 2022 Epoch Time (ms): 1654475863715

It was 3 min 41 sec using the JDK 17 nightly build of the same day, as I reported above.

Akira1Saitoh commented 2 years ago

I took some perf data on an internal machine. It turned out that ~40% of cpu time was spent on patching the callsite of interface dispatch on recompilation. We cache receiver class and method entry address for interface calls. When the cache hits, the entry address is loaded from the cache and used for dispatch. When recompilation happens, the callsite is updated so that it branches to the updated (recompiled) method, but it seems that it is not updated when the interface call is dispatched via cached address.

I opened a draft PR to resolve it. https://github.com/eclipse-openj9/openj9/pull/15242

knn-k commented 2 years ago

Significant improvement with PR #15242 for this testcase on AArch64 macOS as shown below. The gap is smaller, but JDK 11 is still faster than JDK 17:

Build JDK 11 JDK 17 Gap (11 vs 17)
June 8 Nightly 91 sec. 157 sec. x1.7
With PR #15242 36 sec. 44 sec. x1.2
Speedup by the PR x2.5 x3.6

Good job, @Akira1Saitoh !

knn-k commented 2 years ago

AArch64 Linux results have also improved between June 8 and 9. They are all on ub20-aarch64-osu-x. I think we can close this issue.

Build JDK 11 JDK 17
June 8 Nightly 10 min. 16 min. (timeout)
June 9 Nightly 3 min. 51 sec. 4 min. 3 sec.
pshipton commented 2 years ago

Is it understood why jdk17 is slower than jdk11? Do we have an issue to address that?

pshipton commented 2 years ago

Is the change low risk enough to put into the 0.33 release?

knn-k commented 2 years ago

I haven't looked into the difference between JDK 11 and 17. I opened Issue #15301 for tracking the performance gap.

Is the change low risk enough to put into the 0.33 release?

It is four days since PR #15242 was merged, and there are no failures caused by that change AFAIK. I guess we can put it in 0.33.

Akira1Saitoh commented 2 years ago

Opened https://github.com/eclipse-openj9/openj9/pull/15314 for 0.33.

knn-k commented 2 years ago

I found similar performance gaps with CountLargeTest between JDK 11 and 17 on other platforms. See #15301.