eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

jdk19 OpenJDK java/lang/Thread/virtual/stress/Skynet.java hang / timeout #16728

Closed pshipton closed 9 months ago

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/111/ jdk_lang_0 java/lang/Thread/virtual/stress/Skynet.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/111/openjdk_test_output.tar.gz

21:55:35  ACTION: main -- Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 2400000ms, elapsed time including timeout handling was 2404113ms).
21:55:35  REASON: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
21:55:35  TIME:   2404.116 seconds
21:55:35  messages:
21:55:35  command: main --enable-preview Skynet
21:55:35  reason: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
21:55:35  Mode: othervm [/othervm specified]
21:55:35  Timeout information:
21:55:35  Running jcmd on process 2529589
21:55:35  Dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/aqa-tests/TKG/output_1676426825596/jdk_lang_0/work/scratch/2/core.20230214.215530.2529589.0001.dmp
21:55:35  Dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/aqa-tests/TKG/output_1676426825596/jdk_lang_0/work/scratch/2/javacore.20230214.215531.2529589.0002.txt
21:55:35  Running jstack on process 2529589
21:55:35  2023-02-14T21:55:32.813679073
21:55:35  Virtual machine: 2529589 JVM information:
21:55:35  JRE 19 Linux aarch64-64-Bit Compressed References 20230215_180 (JIT enabled, AOT enabled)
21:55:35  OpenJ9   - 711ca6c2adb
21:55:35  OMR      - 6adea0343aa
21:55:35  JCL      - 47925e78c4e based on jdk-19.0.2+7
21:55:35  
21:55:35  "main" prio=5 Id=2 WAITING
21:55:35    at java.base@19.0.2-internal/java.lang.Object.waitImpl(Native Method)
21:55:35    at java.base@19.0.2-internal/java.lang.Object.wait(Object.java:255)
21:55:35    at java.base@19.0.2-internal/java.lang.Object.wait(Object.java:221)
21:55:35    at java.base@19.0.2-internal/java.lang.Thread.join(Thread.java:2151)
21:55:35    at java.base@19.0.2-internal/java.lang.Thread.join(Thread.java:2227)
21:55:35    at app//com.sun.javatest.regtest.agent.MainWrapper.main(MainWrapper.java:74)
21:55:35  
21:55:35  "JIT Compilation Thread-000" prio=10 Id=4 RUNNABLE
21:55:35  
21:55:35  "JIT Compilation Thread-001 Suspended" prio=10 Id=5 RUNNABLE
21:55:35  
21:55:35  "JIT Compilation Thread-002 Suspended" prio=10 Id=6 RUNNABLE
21:55:35  
21:55:35  "JIT Compilation Thread-003 Suspended" prio=10 Id=7 RUNNABLE
21:55:35  
21:55:35  "JIT Compilation Thread-004 Suspended" prio=10 Id=8 RUNNABLE
21:55:35  
21:55:35  "JIT Compilation Thread-005 Suspended" prio=10 Id=9 RUNNABLE
21:55:35  
21:55:35  "JIT Compilation Thread-006 Suspended" prio=10 Id=10 RUNNABLE
21:55:35  
21:55:35  "JIT Diagnostic Compilation Thread-007 Suspended" prio=10 Id=11 RUNNABLE
21:55:35  
21:55:35  "JIT-SamplerThread" prio=10 Id=12 TIMED_WAITING
21:55:35  
21:55:35  "IProfiler" prio=5 Id=13 RUNNABLE
21:55:35  
21:55:35  "Common-Cleaner" prio=8 Id=3 TIMED_WAITING
21:55:35    at java.base@19.0.2-internal/java.lang.Object.waitImpl(Native Method)
21:55:35    at java.base@19.0.2-internal/java.lang.Object.wait(Object.java:255)
21:55:35    at java.base@19.0.2-internal/java.lang.Object.wait(Object.java:221)
21:55:35    at java.base@19.0.2-internal/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:137)
21:55:35    at java.base@19.0.2-internal/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:140)
21:55:35    at java.base@19.0.2-internal/java.lang.Thread.run(Thread.java:1573)
21:55:35    at java.base@19.0.2-internal/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:186)
21:55:35  
21:55:35  "Concurrent Mark Helper" prio=1 Id=14 RUNNABLE
21:55:35  
21:55:35  "GC Worker" prio=5 Id=15 RUNNABLE
21:55:35  
21:55:35  "GC Worker" prio=5 Id=16 RUNNABLE
21:55:35  
21:55:35  "GC Worker" prio=5 Id=17 RUNNABLE
21:55:35  
21:55:35  "GC Worker" prio=5 Id=18 RUNNABLE
21:55:35  
21:55:35  "GC Worker" prio=5 Id=19 RUNNABLE
21:55:35  
21:55:35  "GC Worker" prio=5 Id=20 RUNNABLE
21:55:35  
21:55:35  "GC Worker" prio=5 Id=21 RUNNABLE
21:55:35  
21:55:35  "Attach API wait loop" prio=10 Id=24 RUNNABLE
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.IPC.waitSemaphore(Native Method)
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.CommonDirectory.waitSemaphore(CommonDirectory.java:264)
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.WaitLoop.waitForNotification(WaitLoop.java:66)
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.WaitLoop.run(WaitLoop.java:157)
21:55:35  
21:55:35  "Finalizer thread" prio=5 Id=25 RUNNABLE
21:55:35  
21:55:35  "MainThread" prio=5 Id=27 WAITING
21:55:35    at java.base@19.0.2-internal/jdk.internal.misc.Unsafe.park(Native Method)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.SynchronousQueue$TransferStack$SNode.block(SynchronousQueue.java:288)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3745)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3690)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:397)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:886)
21:55:35    at app//Skynet$Channel.receive(Skynet.java:110)
21:55:35    at app//Skynet.skynet(Skynet.java:60)
21:55:35    at app//Skynet.main(Skynet.java:50)
21:55:35    at java.base@19.0.2-internal/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
21:55:35    at java.base@19.0.2-internal/java.lang.reflect.Method.invoke(Method.java:578)
21:55:35    at app//com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
21:55:35    at java.base@19.0.2-internal/java.lang.Thread.run(Thread.java:1573)
21:55:35  
21:55:35  "ForkJoinPool-1-worker-2" prio=5 Id=31 RUNNABLE
21:55:35    at java.base@19.0.2-internal/jdk.internal.vm.Continuation.enterImpl(Native Method)
21:55:35    at java.base@19.0.2-internal/jdk.internal.vm.Continuation.run(Continuation.java:198)
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.runContinuation(VirtualThread.java:218)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)
21:55:35  
21:55:35  "ForkJoinPool-1-worker-3" prio=5 Id=41 RUNNABLE
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.notifyJvmtiUnmountEnd(Native Method)
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.afterYield(VirtualThread.java:396)
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.runContinuation(VirtualThread.java:223)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1403)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)
21:55:35  
21:55:35  "ForkJoinPool-1-worker-6" prio=5 Id=47 RUNNABLE
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.notifyJvmtiUnmountEnd(Native Method)
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.afterTerminate(VirtualThread.java:424)
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.runContinuation(VirtualThread.java:221)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)
21:55:35  
21:55:35  "ForkJoinPool-1-worker-7" prio=5 Id=48 RUNNABLE
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.notifyJvmtiMountBegin(Native Method)
21:55:35    at java.base@19.0.2-internal/java.lang.VirtualThread.runContinuation(VirtualThread.java:215)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806)
21:55:35    at java.base@19.0.2-internal/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)
21:55:35  
21:55:35  "file lock watchdog" prio=10 Id=10700077 TIMED_WAITING
21:55:35    at java.base@19.0.2-internal/java.lang.Object.waitImpl(Native Method)
21:55:35    at java.base@19.0.2-internal/java.lang.Object.wait(Object.java:255)
21:55:35    at java.base@19.0.2-internal/java.lang.Object.wait(Object.java:221)
21:55:35    at java.base@19.0.2-internal/java.util.TimerThread.mainLoop(Timer.java:569)
21:55:35    at java.base@19.0.2-internal/java.util.TimerThread.run(Timer.java:522)
21:55:35  
21:55:35  "Attachment portNumber: 40951" prio=10 Id=10708739 RUNNABLE
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.DiagnosticUtils.dumpAllThreadsImpl(Native Method)
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.DiagnosticUtils.getThreadInfo(DiagnosticUtils.java:245)
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.DiagnosticUtils.executeDiagnosticCommand(DiagnosticUtils.java:181)
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.Attachment.doCommand(Attachment.java:248)
21:55:35    at java.base@19.0.2-internal/openj9.internal.tools.attach.target.Attachment.run(Attachment.java:159)
21:55:35  
21:55:35  
21:55:35  --- Timeout information end.
21:55:35  elapsed time (seconds): 2404.116
21:55:35  configuration:
21:55:35  STDOUT:
21:55:35  Result: 499999500000 in 396129 ms
21:55:35  Result: 499999500000 in 141690 ms
21:55:35  Result: 499999500000 in 344331 ms
21:55:35  Result: 499999500000 in 326083 ms
21:55:35  Result: 499999500000 in 300962 ms
21:55:35  Result: 499999500000 in 379486 ms
21:55:35  Result: 499999500000 in 222160 ms
21:55:35  Result: 499999500000 in 194157 ms
21:55:35  Result: 499999500000 in 50598 ms
21:55:35  Timeout refired 2400 times
pshipton commented 1 year ago

@fengxue-IS @babsingh @tajila fyi

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/110/

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/110/openjdk_test_output.tar.gz

00:01:07  ACTION: main -- Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 2400000ms, elapsed time including timeout handling was 3146907ms).
00:01:07  REASON: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
00:01:07  TIME:   3146.924 seconds
00:01:07  messages:
00:01:07  command: main --enable-preview Skynet
00:01:07  reason: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
00:01:07  Mode: othervm [/othervm specified]
00:01:07  Timeout information:
00:01:07  Running jcmd on process 13500450
00:01:07  Timeout refired 300 times
00:01:07  Error getting data from 13500450: Exception connecting to 13500450
00:01:07  Usage : jcmd <vmid> <arguments>
00:01:07  
00:01:07     -J : supply arguments to the Java VM running jcmd
00:01:07     -l : list JVM processes on the local machine
00:01:07     -h : print this help message
00:01:07  
00:01:07     <vmid> : Attach API VM ID as shown in jps or other Attach API-based tools
00:01:07  
00:01:07     arguments:
00:01:07        help : print the list of diagnostic commands
00:01:07        help <command> : print help for the specific command
00:01:07        <command> [command arguments] : command from the list returned by "help"
00:01:07  
00:01:07  list JVM processes on the local machine. Default behavior when no options are specified.
00:01:07  
00:01:07  NOTE: this utility might significantly affect the performance of the target JVM.
00:01:07      The available diagnostic commands are determined by
00:01:07      the target VM and may vary between VMs.
00:01:07  Timeout handler interrupted: 
00:01:07  java.lang.InterruptedException
00:01:07    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1611)
00:01:07    at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:447)
00:01:07    at jtreg.openj9.CoreDumpTimeoutHandler.runJcmdCommand(CoreDumpTimeoutHandler.java:117)
00:01:07    at jtreg.openj9.CoreDumpTimeoutHandler.runJcmd(CoreDumpTimeoutHandler.java:137)
00:01:07    at jtreg.openj9.CoreDumpTimeoutHandler.runActions(CoreDumpTimeoutHandler.java:55)
00:01:07    at com.sun.javatest.regtest.TimeoutHandler.handleTimeout(TimeoutHandler.java:120)
00:01:07    at com.sun.javatest.regtest.exec.ProcessCommand$2.run(ProcessCommand.java:317)
00:01:07  --- Timeout information end.
00:01:07  elapsed time (seconds): 3146.924
00:01:07  configuration:
00:01:07  STDOUT:
00:01:07  Timeout refired 2400 times
00:01:07  STDERR:
00:01:07  Exception in thread "" Exception in thread "" Exception in thread "" Exception in thread "" Exception in thread "" Exception in thread "" Exception in thread "" Exception in thread "" Exception in thread "" java.lang.StackOverflowError
00:01:07    at java.base/java.util.concurrent.atomic.Striped64$Cell.cas(Striped64.java:128)
00:01:07    at java.base/java.util.concurrent.atomic.LongAdder.add(LongAdder.java:92)
00:01:07    at java.base/jdk.internal.vm.ThreadContainers$RootContainer$CountingRootContainer.onExit(ThreadContainers.java:265)
00:01:07    at java.base/java.lang.VirtualThread.start(VirtualThread.java:472)
00:01:07    at java.base/java.lang.VirtualThread.start(VirtualThread.java:480)
00:01:07    at java.base/java.lang.Thread.startVirtualThread(Thread.java:1484)
00:01:07    at Skynet.skynet(Skynet.java:74)
00:01:07    at Skynet.lambda$skynet$1(Skynet.java:74)
00:01:07    at java.base/java.lang.VirtualThread.run(VirtualThread.java:292)
00:01:07    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
00:01:07    at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:170)
00:01:07  Exception in thread "" Exception in thread "" java.lang.StackOverflowError
00:01:07    at java.base/java.util.concurrent.atomic.Striped64$Cell.cas(Striped64.java:128)
00:01:07    at java.base/java.util.concurrent.atomic.LongAdder.add(LongAdder.java:92)
00:01:07    at java.base/jdk.internal.vm.ThreadContainers$RootContainer$CountingRootContainer.onExit(ThreadContainers.java:265)
00:01:07    at java.base/java.lang.VirtualThread.start(VirtualThread.java:472)
00:01:07    at java.base/java.lang.VirtualThread.start(VirtualThread.java:480)
00:01:07    at java.base/java.lang.Thread.startVirtualThread(Thread.java:1484)
00:01:07    at Skynet.skynet(Skynet.java:74)
00:01:07    at Skynet.lambda$skynet$1(Skynet.java:74)
00:01:07    at java.base/java.lang.VirtualThread.run(VirtualThread.java:292)
00:01:07    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
00:01:07    at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:170)
...
StackOverflowError's repeat
pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/111

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/111/openjdk_test_output.tar.gz

01:53:30  ACTION: main -- Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 2400000ms, elapsed time including timeout handling was 3889808ms).
01:53:30  REASON: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
01:53:30  TIME:   3889.813 seconds
01:53:30  messages:
01:53:30  command: main --enable-preview Skynet
01:53:30  reason: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
01:53:30  Mode: othervm [/othervm specified]
01:53:30  Timeout information:
01:53:30  Running jcmd on process 11917
01:53:30  Timeout refired 300 times
01:53:30  Error getting data from 11917: Exception connecting to 11917
01:53:30  Usage : jcmd <vmid> <arguments>
01:53:30  
01:53:30     -J : supply arguments to the Java VM running jcmd
01:53:30     -l : list JVM processes on the local machine
01:53:30     -h : print this help message
01:53:30  
01:53:30     <vmid> : Attach API VM ID as shown in jps or other Attach API-based tools
01:53:30  
01:53:30     arguments:
01:53:30        help : print the list of diagnostic commands
01:53:30        help <command> : print help for the specific command
01:53:30        <command> [command arguments] : command from the list returned by "help"
01:53:30  
01:53:30  list JVM processes on the local machine. Default behavior when no options are specified.
01:53:30  
01:53:30  NOTE: this utility might significantly affect the performance of the target JVM.
01:53:30      The available diagnostic commands are determined by
01:53:30      the target VM and may vary between VMs.
01:53:30  Error getting data from 11917: Exception connecting to 11917
01:53:30  Usage : jcmd <vmid> <arguments>
01:53:30  
01:53:30     -J : supply arguments to the Java VM running jcmd
01:53:30     -l : list JVM processes on the local machine
01:53:30     -h : print this help message
01:53:30  
01:53:30     <vmid> : Attach API VM ID as shown in jps or other Attach API-based tools
01:53:30  
01:53:30     arguments:
01:53:30        help : print the list of diagnostic commands
01:53:30        help <command> : print help for the specific command
01:53:30        <command> [command arguments] : command from the list returned by "help"
01:53:30  
01:53:30  list JVM processes on the local machine. Default behavior when no options are specified.
01:53:30  
01:53:30  NOTE: this utility might significantly affect the performance of the target JVM.
01:53:30      The available diagnostic commands are determined by
01:53:30      the target VM and may vary between VMs.
01:53:30  Timeout handler interrupted: 
01:53:30  java.lang.InterruptedException
01:53:30    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1611)
01:53:30    at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:447)
01:53:30    at jtreg.openj9.CoreDumpTimeoutHandler.runJcmdCommand(CoreDumpTimeoutHandler.java:117)
01:53:30    at jtreg.openj9.CoreDumpTimeoutHandler.runJcmd(CoreDumpTimeoutHandler.java:138)
01:53:30    at jtreg.openj9.CoreDumpTimeoutHandler.runActions(CoreDumpTimeoutHandler.java:55)
01:53:30    at com.sun.javatest.regtest.TimeoutHandler.handleTimeout(TimeoutHandler.java:120)
01:53:30    at com.sun.javatest.regtest.exec.ProcessCommand$2.run(ProcessCommand.java:317)
01:53:30  --- Timeout information end.
01:53:30  elapsed time (seconds): 3889.813
01:53:30  configuration:
01:53:30  STDOUT:
01:53:30  Timeout refired 2400 times
01:53:30  STDERR:
01:53:30  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/15 01:34:18 - please wait.
01:53:30  JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013418.11917.0001.txt' in response to an event
01:53:30  JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013418.11917.0001.txt
01:53:30  JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/Snap.20230215.013418.11917.0002.trc' in response to an event
01:53:30  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/15 01:34:24 - please wait.
01:53:30  JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/Snap.20230215.013418.11917.0002.trc
01:53:30  JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
01:53:30  JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013424.11917.0003.txt' in response to an event
01:53:30  JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013424.11917.0003.txt
01:53:30  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/15 01:34:26 - please wait.
01:53:30  JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/Snap.20230215.013424.11917.0004.trc' in response to an event
01:53:30  JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/Snap.20230215.013424.11917.0004.trc
01:53:30  JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
01:53:30  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/15 01:34:30 - please wait.
01:53:30  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/15 01:34:30 - please wait.
01:53:30  JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013426.11917.0005.txt' in response to an event
01:53:30  JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013426.11917.0005.txt
01:53:30  JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/Snap.20230215.013430.11917.0006.trc' in response to an event
01:53:30  JVMDUMP010I Snap dump written to {nothing to snap}
01:53:30  JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
01:53:30  JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/Snap.20230215.013426.11917.0008.trc' in response to an event
01:53:30  JVMDUMP010I Snap dump written to {nothing to snap}
01:53:30  JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
01:53:30  JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013430.11917.0007.txt' in response to an event
01:53:30  JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/aqa-tests/TKG/output_16764348034303/jdk_lang_1/work/scratch/1/javacore.20230215.013430.11917.0007.txt
01:53:30  JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
01:53:30  Exception in thread "Attach API wait loop" java/lang/OutOfMemoryError: Java heap space
01:53:30    at java/lang/Throwable.printStackTrace (java.base@19.0.2-internal/Throwable.java:554)
01:53:30    at java/lang/Throwable.printStackTraceHelper (java.base@19.0.2-internal/Throwable.java:338)
01:53:30    at java/lang/Throwable.printStackTrace (java.base@19.0.2-internal/Throwable.java:307)
01:53:30    at java/lang/Throwable.printStackTrace (java.base@19.0.2-internal/Throwable.java:250)
pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/111

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/111/openjdk_test_output.tar.gz

23:53:48  ACTION: main -- Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 2400000ms, elapsed time including timeout handling was 3144194ms).
23:53:48  REASON: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
23:53:48  TIME:   3144.2 seconds
23:53:48  messages:
23:53:48  command: main --enable-preview Skynet
23:53:48  reason: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
23:53:48  Mode: othervm [/othervm specified]
23:53:48  Timeout information:
23:53:48  Running jcmd on process 6013
23:53:48  Timeout refired 300 times
23:53:48  Error getting data from 6013: Exception connecting to 6013
23:53:48  Usage : jcmd <vmid> <arguments>
23:53:48  
23:53:48     -J : supply arguments to the Java VM running jcmd
23:53:48     -l : list JVM processes on the local machine
23:53:48     -h : print this help message
23:53:48  
23:53:48     <vmid> : Attach API VM ID as shown in jps or other Attach API-based tools
23:53:48  
23:53:48     arguments:
23:53:48        help : print the list of diagnostic commands
23:53:48        help <command> : print help for the specific command
23:53:48        <command> [command arguments] : command from the list returned by "help"
23:53:48  
23:53:48  list JVM processes on the local machine. Default behavior when no options are specified.
23:53:48  
23:53:48  NOTE: this utility might significantly affect the performance of the target JVM.
23:53:48      The available diagnostic commands are determined by
23:53:48      the target VM and may vary between VMs.
23:53:48  Timeout handler interrupted: 
23:53:48  java.lang.InterruptedException
23:53:48    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1611)
23:53:48    at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:447)
23:53:48    at jtreg.openj9.CoreDumpTimeoutHandler.runJcmdCommand(CoreDumpTimeoutHandler.java:117)
23:53:48    at jtreg.openj9.CoreDumpTimeoutHandler.runJcmd(CoreDumpTimeoutHandler.java:137)
23:53:48    at jtreg.openj9.CoreDumpTimeoutHandler.runActions(CoreDumpTimeoutHandler.java:55)
23:53:48    at com.sun.javatest.regtest.TimeoutHandler.handleTimeout(TimeoutHandler.java:120)
23:53:48    at com.sun.javatest.regtest.exec.ProcessCommand$2.run(ProcessCommand.java:317)
23:53:48  --- Timeout information end.
23:53:48  elapsed time (seconds): 3144.2
23:53:48  configuration:
23:53:48  STDOUT:
23:53:48  Result: 499999500000 in 18817 ms
23:53:48  Result: 499999500000 in 13686 ms
23:53:48  Result: 499999500000 in 10776 ms
23:53:48  Result: 499999500000 in 13522 ms
23:53:48  Result: 499999500000 in 9920 ms
23:53:48  Result: 499999500000 in 9370 ms
23:53:48  Result: 499999500000 in 10632 ms
23:53:48  Result: 499999500000 in 9725 ms
23:53:48  Result: 499999500000 in 11140 ms
23:53:48  Timeout refired 2400 times
23:53:48  STDERR:
23:53:48  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/14 23:03:40 - please wait.
23:53:48  JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/javacore.20230214.230340.6013.0001.txt' in response to an event
23:53:48  JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/javacore.20230214.230340.6013.0001.txt
23:53:48  JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/Snap.20230214.230340.6013.0002.trc' in response to an event
23:53:48  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/14 23:03:41 - please wait.
23:53:48  JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/14 23:03:41 - please wait.
23:53:48  JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/Snap.20230214.230340.6013.0002.trc
23:53:48  JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
23:53:48  Exception in thread "" java.lang.OutOfMemoryError: Java heap space
23:53:48    at java.base/java.util.concurrent.SynchronousQueue$TransferStack.snode(SynchronousQueue.java:328)
23:53:48    at java.base/java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:371)
23:53:48    at java.base/java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:886)
23:53:48    at Skynet$Channel.receive(Skynet.java:110)
23:53:48    at Skynet.skynet(Skynet.java:78)
23:53:48    at Skynet.lambda$skynet$1(Skynet.java:74)
23:53:48    at java.base/java.lang.VirtualThread.run(VirtualThread.java:292)
23:53:48    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
23:53:48    at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:170)
23:53:48  JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/javacore.20230214.230341.6013.0004.txt' in response to an event
23:53:48  JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/javacore.20230214.230341.6013.0004.txt
23:53:48  JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/Snap.20230214.230341.6013.0005.trc' in response to an event
23:53:48  JVMDUMP010I Snap dump written to {nothing to snap}
23:53:48  JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
23:53:48  Exception in thread "" java.lang.OutOfMemoryError: Java heap space
23:53:48  JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/javacore.20230214.230341.6013.0003.txt' in response to an event
23:53:48  JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/javacore.20230214.230341.6013.0003.txt
23:53:48  JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/Snap.20230214.230341.6013.0006.trc' in response to an event
23:53:48    at java.base/java.lang.invoke.DirectMethodHandleJVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/aqa-tests/TKG/output_16764301149136/jdk_lang_1/work/scratch/1/Snap.20230214.230341.6013.0006.trc
23:53:48  .JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
23:53:48  allocateInstance(DirectMethodHandle.java:501)
23:53:48    at java.base/java.lang.VirtualThread$VThreadContinuation.<init>(VirtualThread.java:179)
23:53:48    at java.base/java.lang.VirtualThread.<init>(VirtualThread.java:170)
23:53:48    at java.base/java.lang.ThreadBuilders.newVirtualThread(ThreadBuilders.java:384)
23:53:48    at java.base/java.lang.Thread.startVirtualThread(Thread.java:1483)
23:53:48    at Skynet.skynet(Skynet.java:74)
23:53:48    at Skynet.lambda$skynet$1(Skynet.java:74)
23:53:48    at java.base/java.lang.VirtualThread.run(VirtualThread.java:292)
23:53:48    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
23:53:48    at java.baseJVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/14 23:03:50 - please wait.
pshipton commented 1 year ago

Changes https://github.com/eclipse-openj9/openj9/compare/91601086afe...711ca6c2adb

@babsingh is this caused by https://github.com/eclipse-openj9/openj9/pull/16678 ?

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/110 jdk_lang_1 java/lang/Thread/virtual/stress/Skynet.java

00:33:06  ACTION: main -- Failed. Unexpected exit from test [exit code: 255]
00:33:06  REASON: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
00:33:06  TIME:   148.241 seconds
00:33:06  messages:
00:33:06  command: main --enable-preview Skynet
00:33:06  reason: User specified action: run main/othervm/timeout=300 --enable-preview Skynet 
00:33:06  Mode: othervm [/othervm specified]
00:33:06  elapsed time (seconds): 148.241
00:33:06  configuration:
00:33:06  STDOUT:
00:33:06  STDERR:
00:33:06  Unhandled exception
00:33:06  Type=Segmentation error vmState=0x00000000
00:33:06  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
00:33:06  Handler1=0000000011437C60 Handler2=00000000117369F0 InaccessibleAddress=0000000000000000
00:33:06  RDI=0000000000000000 RSI=00007FDBB8C0C210 RAX=0000000000000400 RBX=000000003BA34B00
00:33:06  RCX=00007FDBB89F5E00 RDX=0000000000000113 R8=000070000C43CB08 R9=000070000C43CAE8
00:33:06  R10=00000000555BD4B1 R11=00007FDBB20C7900 R12=000070000C43CAF0 R13=FFFFFFFFFFFFFF00
00:33:06  R14=000070000C43CAF8 R15=000070000C43CAE8
00:33:06  RIP=00000000114E9182 GS=0000 FS=0000 RSP=000070000C43C830
00:33:06  RFlags=0000000000010206 CS=002B RBP=000070000C43CAD0 ERR=0000000000000004
00:33:06  TRAPNO=000000040000000E CPU=0000000000040000 FAULTVADDR=0000000000000000
00:33:06  XMM0 00007fdbb20c7900 (f: 2987161856.000000, d: 6.945652e-310)
00:33:06  XMM1 00000000555bd4b1 (f: 1432081536.000000, d: 7.075423e-315)
00:33:06  XMM2 00000000df000000 (f: 3741319168.000000, d: 1.848457e-314)
00:33:06  XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM6 3fca23bc1fe2b563 (f: 534951264.000000, d: 2.042155e-01)
00:33:06  XMM7 4028f40b5ed9812d (f: 1591312640.000000, d: 1.247665e+01)
00:33:06  XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:33:06  Module=/Users/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/openjdkbinary/j2sdk-image/lib/default/libj9vm29.dylib
00:33:06  Module_base_address=0000000011400000 Symbol=_ZN26VM_BytecodeInterpreterFull3runEP10J9VMThread
00:33:06  Symbol_address=00000000114C0510
00:33:06  Target=2_90_20230214_179 (Mac OS X 10.15.7)
00:33:06  CPU=amd64 (12 logical CPUs) (0x400000000 RAM)
00:33:06  ----------- Stack Backtrace -----------
00:33:06  _ZN26VM_BytecodeInterpreterFull3runEP10J9VMThread+0x28c73 (0x00000000114E9183 [libj9vm29.dylib+0xe9183])
00:33:06  bytecodeLoopFull+0xb4 (0x00000000114C0504 [libj9vm29.dylib+0xc0504])
00:33:06  ---------------------------------------
fengxue-IS commented 1 year ago

Likely caused by https://github.com/adoptium/aqa-tests/pull/4337. The timeout issue with Skynet is due to current OpenJ9 perf. I suggest we revert the PR to continue exclude the test or increase the timeout to allow test to complete.

@LinHu2016 can you confirm if the segfault is related to https://github.com/eclipse-openj9/openj9/pull/16713

babsingh commented 1 year ago

@babsingh is this caused by https://github.com/eclipse-openj9/openj9/pull/16678 ?

JDK19 impl stays the same in #16678. #16678 only renames a J9VMThread field in JDK19 to make clean up easier once JDK19 goes out of service. So, it should not cause this failure.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/112/ https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_s390x_linux_Nightly/113/

JasonFengJ9 commented 1 year ago

https://github.com/eclipse-openj9/openj9/issues/16728#issuecomment-1431963007 Similar Segmentation error vmState=0x00000000 observed at an internal build

openjdk version "19.0.2-beta" 2023-01-17
IBM Semeru Runtime Open Edition 19.0.2+7-202302150051 (build 19.0.2-beta+7-202302150051)
Eclipse OpenJ9 VM 19.0.2+7-202302150051 (build master-711ca6c2a, JRE 19 AIX ppc64-64-Bit Compressed References 20230214_43 (JIT enabled, AOT enabled)
OpenJ9   - 711ca6c2a
OMR      - 6adea0343
JCL      - 47925e78c4 based on jdk-19.0.2+7)

[2023-02-15T06:12:06.967Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
[2023-02-15T06:12:06.967Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops 

[2023-02-15T06:51:30.849Z] TEST: java/lang/Thread/virtual/stress/Skynet.java#id0

[2023-02-15T06:51:30.850Z] STDERR:
[2023-02-15T06:51:30.850Z] Unhandled exception
[2023-02-15T06:51:30.850Z] Type=Segmentation error vmState=0x00000000
[2023-02-15T06:51:30.850Z] J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000032
[2023-02-15T06:51:30.850Z] Handler1=09001000A0C9AEF0 Handler2=09001000A0ACC278
[2023-02-15T06:51:30.850Z] R0=00000000FFDFCE44 R1=00000100237CB170 R2=09001000A0C9EC70 R3=00000000301F4A00
[2023-02-15T06:51:30.850Z] R4=0000000030223910 R5=0000000000000113 R6=0000000000000001 R7=30247A1000000008
[2023-02-15T06:51:30.850Z] R8=0000000000000400 R9=0000000000000003 R10=000000003007D418 R11=09001000A0C9EA78
[2023-02-15T06:51:30.850Z] R12=0000000084000242 R13=00000100237D6800 R14=0000000030BF1330 R15=000000003028EC00
[2023-02-15T06:51:30.850Z] R16=0000010021AA7318 R17=FFFFFFFFFFFFFFFF R18=0000000000000000 R19=09001000A07A7FB0
[2023-02-15T06:51:30.850Z] R20=0000000000000000 R21=000000003028F740 R22=00000100237CB900 R23=00000000FFDFCE00
[2023-02-15T06:51:30.850Z] R24=0000000000000000 R25=0000000000000001 R26=0000000000000000 R27=0000000000000001
[2023-02-15T06:51:30.850Z] R28=0A0001000939D0D9 R29=0000000030BF1320 R30=0000000000000000 R31=CFD9759DEA4DB69E
[2023-02-15T06:51:30.850Z] IAR=09000000041C5B58 LR=09000000041C78CC MSR=A00000000200D032 CTR=09000000041C5B0C
[2023-02-15T06:51:30.850Z] CR=4800024260040000 FPSCR=8202000000000000 XER=6004000082020000
[2023-02-15T06:51:30.851Z] FPR0 0000000000000043 (f: 67.000000, d: 3.310240e-322)
[2023-02-15T06:51:30.851Z] FPR1 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
[2023-02-15T06:51:30.851Z] FPR2 3fc6b8549f66f926 (f: 2674325760.000000, d: 1.775003e-01)
[2023-02-15T06:51:30.851Z] FPR3 3fb1784400000000 (f: 0.000000, d: 6.824136e-02)
[2023-02-15T06:51:30.851Z] FPR4 bf90ecf58ddc7f4a (f: 2380037888.000000, d: -1.652893e-02)
[2023-02-15T06:51:30.851Z] FPR5 40a6fb0000000000 (f: 0.000000, d: 2.941500e+03)
[2023-02-15T06:51:30.851Z] FPR6 4070b222714147a7 (f: 1900103552.000000, d: 2.671334e+02)
[2023-02-15T06:51:30.851Z] FPR7 bfb62e42fefa4000 (f: 4277813248.000000, d: -8.664340e-02)
[2023-02-15T06:51:30.851Z] FPR8 3f6069c0ff9e20ca (f: 4288553216.000000, d: 2.003552e-03)
[2023-02-15T06:51:30.851Z] FPR9 40288b308ea81820 (f: 2393380864.000000, d: 1.227185e+01)
[2023-02-15T06:51:30.851Z] FPR10 412e848000000000 (f: 0.000000, d: 1.000000e+06)
[2023-02-15T06:51:30.851Z] FPR11 43300000000f4240 (f: 1000000.000000, d: 4.503600e+15)
[2023-02-15T06:51:30.851Z] FPR12 4530000000000000 (f: 0.000000, d: 1.934281e+25)
[2023-02-15T06:51:30.851Z] FPR13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
[2023-02-15T06:51:30.851Z] Module=/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_testList_1/openjdkbinary/j2sdk-image/lib/default/libj9vm29.so
[2023-02-15T06:51:30.851Z] Module_base_address=0900000003F55000
[2023-02-15T06:51:30.851Z] Target=2_90_20230214_43 (AIX 7.2)
[2023-02-15T06:51:30.851Z] CPU=ppc64 (16 logical CPUs) (0x200000000 RAM)
[2023-02-15T06:51:30.851Z] ----------- Stack Backtrace -----------
[2023-02-15T06:51:30.851Z] (0x0900000003FDFAE8 [libj9vm29.so+0x8aae8])
[2023-02-15T06:51:30.851Z] runJavaThread+0x268 (0x0900000003FC7D4C [libj9vm29.so+0x72d4c])
[2023-02-15T06:51:30.852Z] _ZL23javaProtectedThreadProcP13J9PortLibraryPv+0x11c (0x0900000003F57F00 [libj9vm29.so+0x2f00])
[2023-02-15T06:51:30.852Z] omrsig_protect+0x4cc (0x09000000043405F0 [libj9prt29.so+0x5c5f0])
[2023-02-15T06:51:30.852Z] javaThreadProc+0x68 (0x0900000003F57D6C [libj9vm29.so+0x2d6c])
[2023-02-15T06:51:30.852Z] thread_wrapper+0x558 (0x09000000043A98BC [libj9thr29.so+0x58bc])
[2023-02-15T06:51:30.852Z] _pthread_body+0xe8 (0x09000000005B404C [libpthreads.a+0x404c])
[2023-02-15T06:51:30.852Z] ---------------------------------------
[2023-02-15T06:51:30.852Z] JVMDUMP039I Processing dump event "gpf", detail "" at 2023/02/15 01:51:10 - please wait.

[2023-02-15T06:51:30.852Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 255]
[2023-02-15T06:51:30.852Z] --------------------------------------------------
[2023-02-15T07:03:13.617Z] Test results: passed: 853; failed: 1
[2023-02-15T07:03:13.617Z] Report written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_testList_1/aqa-tests/TKG/output_16764415259215/jdk_lang_0/report/html/report.html
[2023-02-15T07:03:13.617Z] Results written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_testList_1/aqa-tests/TKG/output_16764415259215/jdk_lang_0/work
[2023-02-15T07:03:13.617Z] Error: Some tests failed or other problems occurred.
[2023-02-15T07:03:13.617Z] -----------------------------------
[2023-02-15T07:03:13.617Z] jdk_lang_0_FAILED

Also seen at x86-64_mac

JasonFengJ9 commented 1 year ago

The time out occurred at an internal build(sles15le-rt1-9)

openjdk version "19.0.2-beta" 2023-01-17
IBM Semeru Runtime Open Edition 19.0.2+7-202302150049 (build 19.0.2-beta+7-202302150049)
Eclipse OpenJ9 VM 19.0.2+7-202302150049 (build master-711ca6c2a, JRE 19 Linux ppc64le-64-Bit Compressed References 20230214_43 (JIT enabled, AOT enabled)
OpenJ9   - 711ca6c2a
OMR      - 6adea0343
JCL      - 47925e78c4 based on jdk-19.0.2+7)

[2023-02-15T01:50:40.075Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
[2023-02-15T01:50:40.075Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops 

[2023-02-15T03:14:08.552Z] TEST: java/lang/Thread/virtual/stress/Skynet.java#id0

[2023-02-15T03:14:08.557Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_testList_1/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 2400000ms, elapsed time including timeout handling was 2408903ms).
[2023-02-15T03:14:08.557Z] --------------------------------------------------
[2023-02-15T03:19:38.804Z] Test results: passed: 861; error: 1
[2023-02-15T03:20:05.153Z] Report written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_testList_1/aqa-tests/TKG/output_16764258388739/jdk_lang_0/report/html/report.html
[2023-02-15T03:20:05.153Z] Results written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_testList_1/aqa-tests/TKG/output_16764258388739/jdk_lang_0/work
[2023-02-15T03:20:05.153Z] Error: Some tests failed or other problems occurred.
[2023-02-15T03:20:05.153Z] -----------------------------------
[2023-02-15T03:20:05.153Z] jdk_lang_0_FAILED
JasonFengJ9 commented 1 year ago

OutOfMemoryError occurred at an internal build

[2023-02-15T01:09:45.019Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode650
[2023-02-15T01:09:45.019Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:-UseCompressedOops 

openjdk version "19.0.2-beta" 2023-01-17
IBM Semeru Runtime Open Edition 19.0.2+7-202302150036 (build 19.0.2-beta+7-202302150036)
Eclipse OpenJ9 VM 19.0.2+7-202302150036 (build master-711ca6c2a, JRE 19 Linux amd64-64-Bit Compressed References 20230215_43 (JIT enabled, AOT enabled)
OpenJ9   - 711ca6c2a
OMR      - 6adea0343
JCL      - 47925e78c4 based on jdk-19.0.2+7)

[2023-02-15T02:28:35.829Z] TEST: java/lang/Thread/virtual/stress/Skynet.java#id0

[2023-02-15T02:28:35.830Z] STDERR:
[2023-02-15T02:28:35.830Z] JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/02/14 20:37:55 - please wait.
[2023-02-15T02:28:35.830Z] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_1/aqa-tests/TKG/output_16764233839230/jdk_lang_1/work/scratch/javacore.20230214.203755.23458.0001.txt' in response to an event

[2023-02-15T02:28:35.831Z] JVMDUMP010I Snap dump written to {nothing to snap}
[2023-02-15T02:28:35.831Z] JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
[2023-02-15T02:28:35.831Z] Exception in thread "" Exception in thread "" java.lang.OutOfMemoryError: Java heap space
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.Thread.<init>(Thread.java:320)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.BaseVirtualThread.<init>(BaseVirtualThread.java:41)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.VirtualThread.<init>(VirtualThread.java:156)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.ThreadBuilders.newVirtualThread(ThreadBuilders.java:384)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.Thread.startVirtualThread(Thread.java:1483)
[2023-02-15T02:28:35.831Z]  at Skynet.skynet(Skynet.java:74)
[2023-02-15T02:28:35.831Z]  at Skynet.lambda$skynet$1(Skynet.java:74)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.VirtualThread.run(VirtualThread.java:292)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
[2023-02-15T02:28:35.831Z]  at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:170)
[2023-02-15T02:28:35.831Z] java.lang.OutOfMemoryError: Java heap space
[2023-02-15T02:28:35.831Z]  at java.base/jdk.internal.misc.Unsafe.allocateInstance(Native Method)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:501)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.VirtualThread$VThreadContinuation.<init>(VirtualThread.java:179)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.VirtualThread.<init>(VirtualThread.java:170)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.ThreadBuilders.newVirtualThread(ThreadBuilders.java:384)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.Thread.startVirtualThread(Thread.java:1483)
[2023-02-15T02:28:35.831Z]  at Skynet.skynet(Skynet.java:74)
[2023-02-15T02:28:35.831Z]  at Skynet.lambda$skynet$1(Skynet.java:74)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.VirtualThread.run(VirtualThread.java:292)
[2023-02-15T02:28:35.831Z]  at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
[2023-02-15T02:28:35.831Z]  at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:170)

[2023-02-15T02:28:35.832Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_1/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 2400000ms, elapsed time including timeout handling was 3142098ms).
[2023-02-15T02:28:35.832Z] --------------------------------------------------
[2023-02-15T02:34:26.204Z] Test results: passed: 862; error: 1
[2023-02-15T02:34:48.112Z] Report written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_1/aqa-tests/TKG/output_16764233839230/jdk_lang_1/report/html/report.html
[2023-02-15T02:34:48.112Z] Results written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_testList_1/aqa-tests/TKG/output_16764233839230/jdk_lang_1/work
[2023-02-15T02:34:48.112Z] Error: Some tests failed or other problems occurred.
[2023-02-15T02:34:48.112Z] -----------------------------------
[2023-02-15T02:34:48.112Z] jdk_lang_1_FAILED
LinHu2016 commented 1 year ago

Likely caused by adoptium/aqa-tests#4337. The timeout issue with Skynet is due to current OpenJ9 perf. I suggest we revert the PR to continue exclude the test or increase the timeout to allow test to complete.

@LinHu2016 can you confirm if the segfault is related to #16713

it is possible same issue with https://github.com/eclipse-openj9/openj9/issues/16652, the issue(https://github.com/eclipse-openj9/openj9/issues/16652) is not directly related with large cpus, but we have not seen the problem with OS native threads smaller than 64.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/112/ https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/113/ https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/112


https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/113/ https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Nightly/114/


https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/115/ https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/114/


https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/116/ https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/115 https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/115

pshipton commented 1 year ago

Reopen until the backport to 0.37 is completed.

pshipton commented 1 year ago

The test needs to be unexcluded https://github.com/adoptium/aqa-tests/pull/4354

pshipton commented 1 year ago

Skynet is unexcluded via https://github.com/adoptium/aqa-tests/pull/4491

pshipton commented 1 year ago

After https://github.com/eclipse-openj9/openj9/pull/17094 is merged, there is still a timeout and an OOM. These are on the 0.37 branch. https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Release/7 - timeout https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Release/7 - OOM

pshipton commented 1 year ago

Also failures in the head stream.

jdk19 https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/142/ - timeout https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/148/ - OOM https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/145/ (1 timeout, 1 Invalid JIT return address)

01:38:20  *** Invalid JIT return address 000070A6CC096C00 in 000070A75442AB58
01:38:20  
01:38:20  05:36:59.614 0x70a6b8021f00    j9vm.249    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK19_ppc64le_linux_Nightly/openj9/runtime/vm/swalk.c:1632: ((0 ))

jdk20 https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.openjdk_x86-64_windows_Nightly/27 - timeouts java/lang/Thread/virtual/ThreadAPI.java java/lang/Thread/virtual/stress/PinALot.java#id0

fengxue-IS commented 1 year ago

jdk20 https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.openjdk_x86-64_windows_Nightly/27 - timeouts java/lang/Thread/virtual/ThreadAPI.java java/lang/Thread/virtual/stress/PinALot.java#id0

The ThreadAPI.java failure is not a new issue, previously reported https://github.com/eclipse-openj9/openj9/issues/16978. looking at the stacktrace of the exception thrown:

[2023-04-04T03:50:25.527Z] test ThreadAPI.testJoin8(): failure
[2023-04-04T03:50:25.527Z] java.lang.IllegalMonitorStateException
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:175)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1059)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:494)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.remove(ScheduledThreadPoolExecutor.java:1063)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.ThreadPoolExecutor.remove(ThreadPoolExecutor.java:1777)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.cancel(ScheduledThreadPoolExecutor.java:293)
[2023-04-04T03:50:25.527Z]  at java.base/java.lang.VirtualThread.cancel(VirtualThread.java:677)
[2023-04-04T03:50:25.527Z]  at java.base/java.lang.VirtualThread.parkNanos(VirtualThread.java:609)
[2023-04-04T03:50:25.527Z]  at java.base/java.lang.Access.parkVirtualThread(Access.java:519)
[2023-04-04T03:50:25.527Z]  at java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:67)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:267)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:756)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1126)
[2023-04-04T03:50:25.527Z]  at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:276)
[2023-04-04T03:50:25.527Z]  at java.base/java.lang.VirtualThread.joinNanos(VirtualThread.java:816)
[2023-04-04T03:50:25.527Z]  at java.base/java.lang.Thread.join(Thread.java:2130)
[2023-04-04T03:50:25.528Z]  at ThreadAPI.testJoin7(ThreadAPI.java:428)
[2023-04-04T03:50:25.528Z]  at jdk.test.lib.thread.VThreadRunner.lambda$run$0(VThreadRunner.java:77)
[2023-04-04T03:50:25.528Z]  at java.base/java.lang.VirtualThread.run(VirtualThread.java:311)
[2023-04-04T03:50:25.528Z]  at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:182)
[2023-04-04T03:50:25.528Z]  at java.base/jdk.internal.vm.Continuation.enter(Continuation.java:171)

With PinALot.java#id0 output doesn't have anything that looks unexpected other than being slow. I will run a grinder to verify if this is intermittent (slow machine) or perf regression due to #16855. I did notice that the java 20 test is slightly different from 19 where the test iteration is reduced. RI's commit message noted that reason for the reduction is due to test taking too long to complete. I suppose this partly is due the new addition of https://github.com/eclipse-openj9/openj9/issues/16012. As OpenJ9 currently dispatch the JVMTI calls via JNI regardless of agent, this may impact perf.

fengxue-IS commented 1 year ago

I locally tested with JDK20 nightly build on PinALot.java test from before/after #16855 is merged.

time jdk/bin/java --enable-preview -ea -esa -Xmx512m -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+
UseCompressedOops PinALot 
2023-04-04T19:55:34.318526843Z => 452658
2023-04-04T19:55:35.436631973Z => 959520
2023-04-04T19:55:35.519122247Z => 1000000

real    0m2.527s
user    0m5.390s
sys 0m1.174s

time old-jdk/bin/java --enable-preview -ea -esa -Xmx512m -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -
XX:+UseCompressedOops PinALot 
2023-04-04T19:56:05.163645473Z => 124076
2023-04-04T19:56:06.304766406Z => 562295
2023-04-04T19:56:07.301219915Z => 1000000

real    0m3.722s
user    0m8.662s
sys 0m1.574s

increased the iterations by 10x to 5000000:

time jdk/bin/java --enable-preview -ea -esa -Xmx512m -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+
UseCompressedOops PinALot 5000000
2023-04-04T19:56:58.226428550Z => 130626
2023-04-04T19:56:59.359789775Z => 659455
2023-04-04T19:57:00.360968951Z => 1133303
2023-04-04T19:57:01.361756931Z => 1638383
2023-04-04T19:57:02.362573121Z => 2142014
2023-04-04T19:57:03.363637458Z => 2636865
2023-04-04T19:57:04.364122487Z => 3117207
2023-04-04T19:57:05.365094147Z => 3613678
2023-04-04T19:57:06.366028424Z => 4114113
2023-04-04T19:57:07.366813543Z => 4644251
2023-04-04T19:57:08.039425542Z => 5000000

real    0m11.416s
user    0m18.114s
sys 0m5.317s

time old-jdk/bin/java --enable-preview -ea -esa -Xmx512m -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -
XX:+UseCompressedOops PinALot 5000000
2023-04-04T19:56:32.590545312Z => 312846
2023-04-04T19:56:33.652704658Z => 762978
2023-04-04T19:56:34.653076165Z => 1219315
2023-04-04T19:56:35.653478672Z => 1650160
2023-04-04T19:56:36.653990761Z => 2107737
2023-04-04T19:56:37.654840099Z => 2547555
2023-04-04T19:56:38.655251749Z => 3002821
2023-04-04T19:56:39.655966419Z => 3421881
2023-04-04T19:56:40.656820902Z => 3860420
2023-04-04T19:56:41.657195121Z => 4251890
2023-04-04T19:56:42.657974834Z => 4668724
2023-04-04T19:56:43.504010600Z => 5000000

real    0m12.232s
user    0m19.102s
sys 0m5.383s

In both cases, new impl is slightly faster than old code. so I suggest we re-test on the same machine to see if this is a machine issue.

babsingh commented 1 year ago

There is lock contention for sub-4G memory allocation. This can cause higher runtimes. Running the perf test on a highly virtualized machine can further increase the runtimes.

For all timeouts, I have started grinders with -Xnocompressedrefs on the failing machine and a randomly selected machine to verify if the timeouts are resolved.

If the timeouts are resolved with -Xnocompressedrefs, then Skynet would need to be excluded under https://github.com/eclipse-openj9/openj9/issues/15184.

If the timeouts are due to slow machines, then we should either increase the timeout limit or prevent Skynet to be run on slow machines.

[JDK19] https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/142/ - timeout

Grinders:

[JDK19] https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64le_linux_Nightly/145/ - timeout

Grinders:

[JDK20] https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.openjdk_x86-64_windows_Nightly/27 - timeouts

Grinders:

babsingh commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_linux_Release/7 - OOM https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/148/ - OOM

J9AllocateObject fails even though there is available object memory. @amicic @dmitripivkine @LinHu2016 Did we miss a dependency while enabling the continuation list in https://github.com/eclipse-openj9/openj9/pull/16855?

javacore: Meminfo

======= x64 Linux =======

1STHEAPTYPE    Object Memory
NULL           id                 start              end                size               space/region
1STHEAPSPACE   0x00007F64000A6050         --                 --                 --         Generational
1STHEAPREGION  0x00007F64000A69C0 0x00007F63E0000000 0x00007F63F8000000 0x0000000018000000 Generational/Tenured Region
1STHEAPREGION  0x00007F64000A6570 0x00007F63F8000000 0x00007F63FBCA0000 0x0000000003CA0000 Generational/Nursery Region
1STHEAPREGION  0x00007F64000A6120 0x00007F63FBCA0000 0x00007F6400000000 0x0000000004360000 Generational/Nursery Region
NULL
1STHEAPTOTAL   Total memory:                   536870912 (0x0000000020000000)
1STHEAPINUSE   Total memory in use:            502287328 (0x000000001DF04BE0)
1STHEAPFREE    Total memory free:               34583584 (0x00000000020FB420)

======= Windows =======

1STHEAPTYPE    Object Memory
NULL           id                 start              end                size               space/region
1STHEAPSPACE   0x00000279AC1BCA10         --                 --                 --         Generational
1STHEAPREGION  0x00000279AC1A9800 0x00007FF47B870000 0x00007FF493870000 0x0000000018000000 Generational/Tenured Region
1STHEAPREGION  0x00000279AC1A9720 0x00007FF493870000 0x00007FF4977F0000 0x0000000003F80000 Generational/Nursery Region
1STHEAPREGION  0x00000279AC1AAFA0 0x00007FF4977F0000 0x00007FF49B870000 0x0000000004080000 Generational/Nursery Region
NULL
1STHEAPTOTAL   Total memory:                   536870912 (0x0000000020000000)
1STHEAPINUSE   Total memory in use:            528668240 (0x000000001F82D650)
1STHEAPFREE    Total memory free:                8202672 (0x00000000007D29B0)

javacore: GC History

======= x64 Linux =======

1STGCHTYPE     GC History
3STHSTTYPE     23:39:03:401370000 GMT j9mm.100 -   J9AllocateObject() returning NULL! 80 bytes requested for object of class 00007F6374081900 from memory space 'Generational' id=00007F64000A6050
3STHSTTYPE     23:39:03:401369000 GMT j9mm.83 -   Forcing J9AllocateObject() to fail due to excessive GC
3STHSTTYPE     23:39:03:401158000 GMT j9mm.100 -   J9AllocateObject() returning NULL! 40 bytes requested for object of class 00007F64004E8500 from memory space 'Generational' id=00007F64000A6050
3STHSTTYPE     23:39:03:401157000 GMT j9mm.83 -   Forcing J9AllocateObject() to fail due to excessive GC
3STHSTTYPE     23:39:03:401153000 GMT j9mm.100 -   J9AllocateObject() returning NULL! 40 bytes requested for object of class 00007F6374092E00 from memory space 'Generational' id=00007F64000A6050
3STHSTTYPE     23:39:03:401152000 GMT j9mm.83 -   Forcing J9AllocateObject() to fail due to excessive GC
3STHSTTYPE     23:39:03:401065000 GMT j9mm.100 -   J9AllocateObject() returning NULL! 40 bytes requested for object of class 00007F64004E8500 from memory space 'Generational' id=00007F64000A6050
3STHSTTYPE     23:39:03:401064000 GMT j9mm.83 -   Forcing J9AllocateObject() to fail due to excessive GC
3STHSTTYPE     23:39:03:400949000 GMT j9mm.134 -   Allocation failure end: newspace=4030376/134217728 oldspace=30573720/402653184 loa=4027392/4027392
3STHSTTYPE     23:39:03:400935000 GMT j9mm.69 -   Concurrent kickoff, tracesizetarget=324760948 kickoffthreshold=42678024 remainingfree=4030376
3STHSTTYPE     23:39:03:400910000 GMT j9mm.470 -   Allocation failure cycle end: newspace=4030416/134217728 oldspace=30573720/402653184 loa=4027392/4027392
3STHSTTYPE     23:39:03:400904000 GMT j9mm.82 -   Excessive GC raised!

======= Windows =======

3STHSTTYPE     03:28:34:412840856 GMT j9mm.70 -   Concurrent aborted
3STHSTTYPE     03:28:34:412722961 GMT j9mm.135 -   Exclusive access: exclusiveaccessms=0.298 meanexclusiveaccessms=0.180 threads=3 lastthreadtid=0x00000279C99FF568 beatenbyotherthread=0
3STHSTTYPE     03:28:34:412704171 GMT j9mm.469 -   Allocation failure cycle start: newspace=0/134217728 oldspace=8202672/402653184 loa=4027392/4027392 requestedbytes=56
3STHSTTYPE     03:28:34:412684522 GMT j9mm.133 -   Allocation failure start: newspace=0/134217728 oldspace=8202672/402653184 loa=4027392/4027392 requestedbytes=56
3STHSTTYPE     03:28:34:412248200 GMT j9mm.100 -   J9AllocateObject() returning NULL! 40 bytes requested for object of class 00000279C99D7F00 from memory space 'Generational' id=00000279AC1BCA10
3STHSTTYPE     03:28:34:412219751 GMT j9mm.101 -   J9AllocateIndexableObject() returning NULL! 64 bytes requested for object of class 00000279C9586A00 from memory space 'Generational' id=00000279AC1BCA10
3STHSTTYPE     03:28:34:412189182 GMT j9mm.83 -   Forcing J9AllocateObject() to fail due to excessive GC
3STHSTTYPE     03:28:34:412159213 GMT j9mm.100 -   J9AllocateObject() returning NULL! 80 bytes requested for object of class 00000279C99C5B00 from memory space 'Generational' id=00000279AC1BCA10
3STHSTTYPE     03:28:34:412123975 GMT j9mm.84 -   Forcing J9AllocateIndexableObject() to fail due to excessive GC
3STHSTTYPE     03:28:34:412092996 GMT j9mm.83 -   Forcing J9AllocateObject() to fail due to excessive GC
3STHSTTYPE     03:28:34:411861145 GMT j9mm.134 -   Allocation failure end: newspace=6032/134217728 oldspace=8202672/402653184 loa=4027392/4027392
3STHSTTYPE     03:28:34:411829416 GMT j9mm.69 -   Concurrent kickoff, tracesizetarget=348309531 kickoffthreshold=45456584 remainingfree=6032
3STHSTTYPE     03:28:34:411739420 GMT j9mm.470 -   Allocation failure cycle end: newspace=6072/134217728 oldspace=8202672/402653184 loa=4027392/4027392
3STHSTTYPE     03:28:34:411714031 GMT j9mm.82 -   Excessive GC raised!

Java Stack during OOM

20:30:04  Exception in thread "" java.lang.OutOfMemoryError: Java heap space
20:30:04    at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:501)
20:30:04    at Skynet.skynet(Skynet.java:74)
20:30:04    at Skynet.lambda$skynet$1(Skynet.java:74)
20:30:04    at java.base/java.lang.VirtualThread.run(VirtualThread.java:293)
20:30:04    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
20:30:04    at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:171)
20:30:04  Exception in thread "" java.lang.OutOfMemoryError: Java heap space
20:30:04    at java.base/java.lang.VirtualThread.<init>(VirtualThread.java:170)
20:30:04    at java.base/java.lang.ThreadBuilders.newVirtualThread(ThreadBuilders.java:384)
20:30:04    at java.base/java.lang.Thread.startVirtualThread(Thread.java:1483)
20:30:04    at Skynet.skynet(Skynet.java:74)
20:30:04    at Skynet.lambda$skynet$1(Skynet.java:74)
20:30:04    at java.base/java.lang.VirtualThread.run(VirtualThread.java:293)
20:30:04    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
20:30:04    at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:171)
20:30:04  Exception in thread "" java.lang.OutOfMemoryError: Java heap space
20:30:04    at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:501)
20:30:04    at Skynet.skynet(Skynet.java:74)
20:30:04    at Skynet.lambda$skynet$1(Skynet.java:74)
20:30:04    at java.base/java.lang.VirtualThread.run(VirtualThread.java:293)
20:30:04    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
20:30:04    at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:171)
20:30:04  Exception in thread "" java.lang.OutOfMemoryError: Java heap space
20:30:04    at java.base/java.util.concurrent.ForkJoinPool.execute(ForkJoinPool.java:2847)
20:30:04    at java.base/java.lang.VirtualThread.submitRunContinuation(VirtualThread.java:240)
20:30:04    at java.base/java.lang.VirtualThread.submitRunContinuation(VirtualThread.java:259)
20:30:04    at java.base/java.lang.VirtualThread.start(VirtualThread.java:468)
20:30:04    at java.base/java.lang.VirtualThread.start(VirtualThread.java:481)
20:30:04    at java.base/java.lang.Thread.startVirtualThread(Thread.java:1484)
20:30:04    at Skynet.skynet(Skynet.java:74)
20:30:04    at Skynet.lambda$skynet$1(Skynet.java:74)
20:30:04    at java.base/java.lang.VirtualThread.run(VirtualThread.java:293)
20:30:04    at java.base/java.lang.VirtualThread$VThreadContinuation.lambda$new$0(VirtualThread.java:179)
20:30:04    at java.base/jdk.internal.vm.Continuation.execute(Continuation.java:171)
20:30:04  Exception in thread "" java.lang.OutOfMemoryError: Java heap space
amicic commented 1 year ago

This OOM due to excessive GC. From what i understand mutators got 5x faster, and GC got slower by an unknown factor, so GC overhead increased significantly...

pshipton commented 1 year ago

In both cases, new impl is slightly faster than old code. so I suggest we re-test on the same machine to see if this is a machine issue.

We ran a jdk20 grinder running PinALot on the same machine that failed last night. It timed out both in compressed and non-compressed modes. I setup the grinder to stop once it failed so there is only one failure in each mode. I can try it again tomorrow with an older JVM.

https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/2186

pshipton commented 1 year ago

I'll ignore compressedrefs Skynet timeouts.

Another OOM on Windows jdk19 running Skynet, non-compressedrefs. This is on a "faster" win2012 machine. We've noted win2019 machines are "slower". https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/149 - win2012-x86-6

Also a timeout in jdk20 compressedrefs mode java/lang/Thread/virtual/ThreadAPI.java https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.openjdk_x86-64_windows_Nightly/28 - win2012x64-openj9-1

babsingh commented 1 year ago

Also a timeout in jdk20 compressedrefs mode java/lang/Thread/virtual/ThreadAPI.java https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.openjdk_x86-64_windows_Nightly/28 - win2012x64-openj9-1

Ref: https://github.com/eclipse-openj9/openj9/issues/16728#issuecomment-1496149522. This failure is identical to the IllegalMonitorStateException seen in https://github.com/eclipse-openj9/openj9/issues/16258 which was fixed by https://github.com/eclipse-openj9/openj9/pull/17009. The timeout may just be a side effect of the failure. fyi @a7ehuo @0xdaryl

Another OOM on Windows jdk19 running Skynet, non-compressedrefs. This is on a "faster" win2012 machine. We've noted win2019 machines are "slower". https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/149 - win2012-x86-6

This is identical to the excessive GC issue reported in https://github.com/eclipse-openj9/openj9/issues/16728#issuecomment-1496803264. Again, the timeout is a side effect of the OOM. #16855 transferred huge workload to the GC by enabling the continuation list and removing the virtual thread list. @amicic Is there a way to mitigate the excessive GC?

23:06:00 Invalid JIT return address 0000000000000000 in 00007D0A8CA7AC68 23:06:00
23:06:00 03:05:30.585 0x7d0a34010800 j9vm.249
ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK19_ppc64le_linux_Nightly/openj9/runtime/vm/swalk.c:1632: ((0 ))

@fengxue-IS is investigating the new Invalid JIT return address failures and will post an update soon.

pshipton commented 1 year ago

Grinder running jdk20 PinALot on win2012 machines - https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/2194/ and jdk19 https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/2195/

pshipton commented 1 year ago

jdk20 still failing both modes on win2012. I ran each grinder on 3 machines. https://openj9-jenkins.osuosl.org/job/Grinder_iteration_0/254/ - passed https://openj9-jenkins.osuosl.org/job/Grinder_iteration_1/251/ - failed https://openj9-jenkins.osuosl.org/job/Grinder_iteration_2/222/ - failed

a7ehuo commented 1 year ago

I'll set up grinder run on win2012 with some more logging to look at java.lang.IllegalMonitorStateException with PinALot.java test

pshipton commented 1 year ago

The jdk19 grinder on PinALot (20x each compessed/noncompressed on 3 machines) passed, while on jdk20 2/3 machines failed. The problem may be jdk20 specific.

a7ehuo commented 1 year ago

@pshipton For JDK20, is win2012 the only platform that shows java.lang.IllegalMonitorStateException failure so far?

pshipton commented 1 year ago

No, it also occurred on win2019. I was targeting win2012 because I think they are faster. The original failure was on win2019 https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.openjdk_x86-64_windows_Nightly/27

pshipton commented 1 year ago

I was going to try a grinder on an older build, using a build from last Wed night, March 29. https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/2200/

pshipton commented 1 year ago

There was a failure with the older build, so it's not a regression caused by https://github.com/eclipse-openj9/openj9/pull/16855 https://openj9-jenkins.osuosl.org/job/Grinder_iteration_2/224/

I can go further back to rule out https://github.com/eclipse-openj9/openj9/pull/16713

https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/2206/

pshipton commented 1 year ago

Still failing https://openj9-jenkins.osuosl.org/job/Grinder_iteration_1/255/

0xdaryl commented 1 year ago

@a7ehuo : With regards to the IMSE, if you are able to see it fail in your grinders can you also try disabling inlining of just Thread.currentThread() to see if the problem is reproducible? It looks like an option to disable inlining just that method hasn't been introduced into the code base yet so it will unfortunately require a special build to check this.

I glanced at the JDK20 VirtualThread code and while it is implemented a bit differently than JDK19 it doesn't appear the JIT should have a problem with it.

pshipton commented 1 year ago

Created https://github.com/eclipse-openj9/openj9/issues/17119 for the jdk20 ThreadAPI IllegalMonitorStateException failure.

Created https://github.com/eclipse-openj9/openj9/issues/17120 for the jdk20 PinALot IllegalMonitorStateException failure.

This issue will continue to track the jdk19 Skynet failures in non-compressed mode.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/147 Skynet OOM noncompressed

4 javacore files, 1 snap https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/147/openjdk_test_output.tar.gz


https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_mac_Nightly/145 Skynet timeout noncompressed

no diagnostics

amicic commented 1 year ago

With OOM failures, as before it's excessive GC. As simple as heap being almost full, able to satisfy allocations, but progress is rather slow: global GCs one after the other. This pattern (with small variation of allocation size and heap occupancy) repeats:

3STHSTTYPE     01:56:58:102243000 GMT j9mm.134 -   Allocation failure end: newspace=0/134217728 oldspace=18441384/402653184 loa=4027392/4027392
3STHSTTYPE     01:56:58:102233000 GMT j9mm.470 -   Allocation failure cycle end: newspace=0/134217728 oldspace=18441384/402653184 loa=4027392/4027392
3STHSTTYPE     01:56:58:102150000 GMT j9mm.475 -   GlobalGC end: workstackoverflow=0 overflowcount=0 memory=18441416/536870912
3STHSTTYPE     01:56:58:102107000 GMT j9mm.90 -   GlobalGC collect complete
3STHSTTYPE     01:56:58:102024000 GMT j9mm.57 -   Sweep end
3STHSTTYPE     01:56:58:096892000 GMT j9mm.56 -   Sweep start
3STHSTTYPE     01:56:58:096890000 GMT j9mm.55 -   Mark end
3STHSTTYPE     01:56:57:680638000 GMT j9mm.54 -   Mark start
3STHSTTYPE     01:56:57:680586000 GMT j9mm.474 -   GlobalGC start: globalcount=102
3STHSTTYPE     01:56:57:680548000 GMT j9mm.135 -   Exclusive access: exclusiveaccessms=0.073 meanexclusiveaccessms=0.073 threads=0 lastthreadtid=0x0000FFFF080040E0 beatenbyotherthread=0
3STHSTTYPE     01:56:57:680547000 GMT j9mm.469 -   Allocation failure cycle start: newspace=0/134217728 oldspace=18441416/402653184 loa=4027392/4027392 requestedbytes=32

From another local run, with VGC logs, same heap size of 512M, but another machine, without OOM, the live set grew to about 350M at some point.

image

Based on the shape of heap occupancy curve, this does not seem to be a steady and deterministic workload (or is it?). So could it be that sometimes producer just happens to generate more work than consumer can process (at a given point of time) and genuinely 512M is not enough for the worst case scenario?

babsingh commented 1 year ago

Based on the shape of heap occupancy curve, this does not seem to be a steady and deterministic workload (or is it?). So could it be that sometimes producer just happens to generate more work than consumer can process (at a given point of time) and genuinely 512M is not enough for the worst case scenario?

Skynet: https://github.com/ibmruntimes/openj9-openjdk-jdk/blob/openj9/test/jdk/java/lang/Thread/virtual/stress/Skynet.java

It begins with a single virtual thread. Then, each virtual thread launches 10 virtual threads until 1 million virtual threads are launched. The workload is exponential.

amicic commented 1 year ago

Ok, not steady (but exponential) by design. Another aspect of non-steadiness is also that it runs 10 iterations, which probably explains a few sudden dips of liveness below 100M.

But indeed, it seems to be non-deterministic either. It effectively creates a tree of jobs, but depending on the order of how they are produced and executed, it could be a vastly different amount of them live at a given point of time.

Best case resembles to depth-first tree traversal. Parent job creates only one child job and child job gets to run right away an so on till leaf job Is created/runs/dies, beck to last parent job which creates second child job.... Num of live objects at a given point of time is proportional to logarithm of tree size (there is a few carrier threads running in parallel, but still all of the could behave the same).

Worst case resembles to breadth-first tree traversal. Parent job creates all child jobs and only then child jobs get to run. Num of live objects constantly grows up until leaf jobs starts dieing. The worst case all jobs are created, started to run, but none of them completed yet. Since leaf jobs are rather simple, it's very unlikely they all will start to run and yield before finish/die, but still theoretically possible that all of them are live at some point of time.

Real life run-to-run and iteration-to-iteration variations could be high.

Could we make it somewhat more deterministic, by adding a yield point between every 2 child jobs created (not just one, since it could be too few of them alive at a given point of time and GC would not be stressed enough)? (Or simply just make the branching factor 2?) Still, the theoretical worst case would be same, but depth paths would be more preferred.

fengxue-IS commented 1 year ago

I've been grinder for about 12 hrs , and have not been able to reproduce the invalid JIT return address failure, 20x on plinux, 20x on xlinux, 20x on local machine.

I will continue to try reproducing this failure, but given the low rate of reproduction, this doesn't seem to be a blocker atm.

babsingh commented 1 year ago

Could we make it somewhat more deterministic, by adding a yield point between every 2 child jobs created (not just one, since it could be too few of them alive at a given point of time and GC would not be stressed enough)? (Or simply just make the branching factor 2?) Still, the theoretical worst case would be same, but depth paths would be more preferred.

The benchmark is in the extension's repo. Based upon past contributions, we can only modify it to correct functionality. Otherwise, the expectation is to match RI. @pshipton Thoughts?

amicic commented 1 year ago

The benchmark is in the extension's repo. Based upon past contributions, we can only modify it to correct functionality. Otherwise, the expectation is to match RI. @pshipton Thoughts?

Ah, I was under impression it was implemented by us. Then, we don't want to change the test, although we can experiment with it locally with the suggested yield chang to see if/how it affects peak heap usage, to just see if the theory holds.

Where does -Xmx512M comes from and could we just increase? Did we run sufficiently with RI JVM to claim OOMs (and timeouts, which could just be a milder variant of the same heap pressure problem) do not occur?

If problem does not occur with RI JVM, one possibility is that we create more/larger java objects per VT instance (added hidden fields and what not). I'm guessing that our implementation does not at least not directly affect the order of how VTs are spawned/executed - it's all under JDK Scheduler's code? However, there could be some indirect effects (for example, contention on suballocator may prevent/slow down stack allocation, execution and consequently death of leaf VTs).

fengxue-IS commented 1 year ago

Where does -Xmx512M comes from and could we just increase? Did we run sufficiently with RI JVM to claim OOMs (and timeouts, which could just be a milder variant of the same heap pressure problem) do not occur?

512 is the default parameter passed to test via the framework, the test file itself does not specify a minimum memory requirement. (Java21's version of test file specified -Xmx1g)

If problem does not occur with RI JVM, one possibility is that we create more/larger java objects per VT instance (added hidden fields and what not). I'm guessing that our implementation does not at least not directly affect the order of how VTs are spawned/executed - it's all under JDK Scheduler's code? However, there could be some indirect effects (for example, contention on suballocator may prevent/slow down stack allocation, execution and consequently death of leaf VTs).

We have our own impl on Continuation class which is different from RI.

I did some testing with RI, it seems that RI's threshold is around 200M, -Xmx192m causes a hang where the JVM is stuck in GC.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/145 jdk_lang_1 java/lang/Thread/virtual/stress/Skynet.java#id0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/145/openjdk_test_output.tar.gz

00:10:37  STDOUT:
00:10:37  Result: 499999500000 in 7673 ms
00:10:37  Result: 499999500000 in 3786 ms
00:10:37  Result: 499999500000 in 3775 ms
00:10:37  Result: 499999500000 in 3555 ms
00:10:37  Result: 499999500000 in 2605 ms
00:10:37  Result: 499999500000 in 2134 ms
00:10:37  STDERR:
00:10:37  Unhandled exception
00:10:37  Type=Segmentation error vmState=0x00020016
00:10:37  J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000032
00:10:37  Handler1=09001000A12F57F8 Handler2=09001000A11DC588
00:10:37  R0=0A00000000000000 R1=00000100241D54C0 R2=09001000A1125C98 R3=00000100101F6450
00:10:37  R4=0000010023CCBA18 R5=0A00000006753AA8 R6=0000000131A0F4C0 R7=0000000131A0F4C0
00:10:37  R8=09001000A111A268 R9=00000100241D5768 R10=0000000131A0F4C0 R11=0000000000000000
00:10:37  R12=0000000000000000 R13=00000100241E2800 R14=000001002377B300 R15=0A00000006753AA8
00:10:37  R16=0000000000000000 R17=00000100101F6450 R18=0000000131A0F4C0 R19=0000000000000094
00:10:37  R20=09001000A0005420 R21=0000000000000008 R22=0000000000000000 R23=0000000000000000
00:10:37  R24=00000100241D5768 R25=00000100241D5760 R26=00000100241D5778 R27=00000100241D5770
00:10:37  R28=00000100241D5751 R29=0000000131A0F4C0 R30=00000100101F6470 R31=0000010023CCBA18
00:10:37  IAR=0900000021891194 LR=090000002189BE1C MSR=A00000000200D032 CTR=09000000218916E0
00:10:37  CR=4404402620000008 FPSCR=8200000000000000 XER=2000000882000000
00:10:37  FPR0 0000000000000001 (f: 1.000000, d: 4.940656e-324)
00:10:37  FPR1 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
00:10:37  FPR2 3feec65b94000000 (f: 2483027968.000000, d: 9.617136e-01)
00:10:37  FPR3 4330000001f67bee (f: 32930798.000000, d: 4.503600e+15)
00:10:37  FPR4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR5 4330000000133d02 (f: 1260802.000000, d: 4.503600e+15)
00:10:37  FPR6 417f67bee0000000 (f: 3758096384.000000, d: 3.293080e+07)
00:10:37  FPR7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR8 4020000000000000 (f: 0.000000, d: 8.000000e+00)
00:10:37  FPR9 402beaa03dab48a0 (f: 1034635392.000000, d: 1.395825e+01)
00:10:37  FPR10 412e848000000000 (f: 0.000000, d: 1.000000e+06)
00:10:37  FPR11 43300000000f4240 (f: 1000000.000000, d: 4.503600e+15)
00:10:37  FPR12 4530000000000000 (f: 0.000000, d: 1.934281e+25)
00:10:37  FPR13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
00:10:37  Module=/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_Nightly/openjdkbinary/j2sdk-image/lib/default/libj9gc_full29.so
00:10:37  Module_base_address=09000000217CF000
00:10:37  Target=2_90_20230406_220 (AIX 7.1)
00:10:37  CPU=ppc64 (20 logical CPUs) (0x780000000 RAM)
00:10:37  ----------- Stack Backtrace -----------
00:10:37  _ZL28vmThreadStackDoOSlotIteratorP10J9VMThreadP16J9StackWalkStatePP8J9ObjectPKv+0x38 (0x090000002189BE1C [libj9gc_full29.so+0xcce1c])
00:10:37  IPRA.$walkJITFrameSlots+0xcc (0x090000001F48F7B0 [libj9jit29.so+0x1057b0])
00:10:37  IPRA.$jitWalkFrame+0x590 (0x090000001F48E774 [libj9jit29.so+0x104774])
00:10:37  jitWalkStackFrames+0x4c4 (0x090000001F48C9C8 [libj9jit29.so+0x1029c8])
00:10:37  walkStackFrames+0x4e4 (0x090000001F00DDE8 [libj9vm29.so+0x109de8])
00:10:37  walkContinuationStackFrames+0x730 (0x090000001F0B1834 [libj9vm29.so+0x1ad834])
00:10:37  _ZN28GC_VMThreadStackSlotIterator21scanContinuationSlotsEP10J9VMThreadP8J9ObjectPvPFvP8J9JavaVMPS3_S4_P16J9StackWalkStatePKvEbb+0xd4 (0x090000002189C198 [libj9gc_full29.so+0xcd198])
00:10:37  _ZN18MM_MarkingDelegate27scanContinuationNativeSlotsEP18MM_EnvironmentBaseP8J9Object+0xa0 (0x0900000021891804 [libj9gc_full29.so+0xc2804])
00:10:37  _ZN22MM_ConcurrentCardTable15cleanSingleCardEP18MM_EnvironmentBasePhmPm+0x15b8 (0x09000000218C5AFC [libj9gc_full29.so+0xf6afc])
00:10:37  _ZN22MM_ConcurrentCardTable10cleanCardsEP18MM_EnvironmentBasebmPmb+0x574 (0x09000000218C3F98 [libj9gc_full29.so+0xf4f98])
00:10:37  _ZN32MM_ConcurrentGCIncrementalUpdate10cleanCardsEP18MM_EnvironmentBasebmPmb+0x168 (0x09000000218DA70C [libj9gc_full29.so+0x10b70c])
00:10:37  _ZN32MM_ConcurrentGCIncrementalUpdate17doConcurrentTraceEP18MM_EnvironmentBaseP22MM_AllocateDescriptionmP17MM_MemorySubSpaceb+0xb28 (0x09000000218D4EAC [libj9gc_full29.so+0x105eac])
00:10:37  _ZN15MM_ConcurrentGC14concurrentMarkEP18MM_EnvironmentBaseP17MM_MemorySubSpaceP22MM_AllocateDescription+0xf6c (0x09000000218E37D0 [libj9gc_full29.so+0x1147d0])
00:10:37  _ZN15MM_ConcurrentGC16payAllocationTaxEP18MM_EnvironmentBaseP17MM_MemorySubSpaceS3_P22MM_AllocateDescription+0x1d8 (0x09000000218E22FC [libj9gc_full29.so+0x1132fc])
00:10:37  _ZN17MM_MemorySubSpace16payAllocationTaxEP18MM_EnvironmentBasePS_P22MM_AllocateDescription+0x164 (0x0900000021810EE8 [libj9gc_full29.so+0x41ee8])
00:10:37  _ZN17MM_MemorySubSpace16payAllocationTaxEP18MM_EnvironmentBasePS_P22MM_AllocateDescription+0x78 (0x0900000021810DFC [libj9gc_full29.so+0x41dfc])
00:10:37  _ZN17MM_MemorySubSpace16payAllocationTaxEP18MM_EnvironmentBasePS_P22MM_AllocateDescription+0x78 (0x0900000021810DFC [libj9gc_full29.so+0x41dfc])
00:10:37  _ZN17MM_MemorySubSpace16payAllocationTaxEP18MM_EnvironmentBaseP22MM_AllocateDescription+0x30 (0x0900000021810CF4 [libj9gc_full29.so+0x41cf4])
00:10:37  _Z21OMR_GC_AllocateObjectP12OMR_VMThreadP25MM_AllocateInitialization+0x390 (0x090000002185C994 [libj9gc_full29.so+0x8d994])
00:10:37  J9AllocateObjectNoGC+0x414 (0x0900000021857618 [libj9gc_full29.so+0x88618])
00:10:37  fast_jitNewObject+0x74 (0x090000001F487A58 [libj9jit29.so+0xfda58])
00:10:37  (0x090000001F4560B4 [libj9jit29.so+0xcc0b4])
00:10:37  runJavaThread+0x28c (0x090000001EF78A50 [libj9vm29.so+0x74a50])
00:10:37  _ZL23javaProtectedThreadProcP13J9PortLibraryPv+0x11c (0x090000001EF07020 [libj9vm29.so+0x3020])
00:10:37  omrsig_protect+0x4fc (0x0900000012E601E0 [libj9prt29.so+0x5f1e0])
00:10:37  javaThreadProc+0x70 (0x090000001EF06E94 [libj9vm29.so+0x2e94])
00:10:37  thread_wrapper+0x14c (0x090000001F29B590 [libj9thr29.so+0x5590])
00:10:37  _pthread_body+0xf0 (0x090000000051FE14 [libpthreads.a+0x3e14])
00:10:37  ---------------------------------------

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_x86-64_windows_Nightly/151 - OOM

JasonFengJ9 commented 1 year ago

JDK19 ppc64_aix internal build(paix821)

[2023-04-08T19:48:54.105Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode150
[2023-04-08T19:48:54.105Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops 

[2023-04-08T21:04:14.062Z] TEST: java/lang/Thread/virtual/stress/Skynet.java#id0

[2023-04-08T21:04:14.064Z] TEST RESULT: Error. Program `/home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_testList_0/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 2400000ms, elapsed time including timeout handling was 2412656ms).
[2023-04-08T21:04:14.064Z] --------------------------------------------------
[2023-04-08T21:11:24.420Z] Test results: passed: 853; error: 1
[2023-04-08T21:11:24.420Z] Report written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_testList_0/aqa-tests/TKG/output_16809833339768/jdk_lang_0/report/html/report.html
[2023-04-08T21:11:24.420Z] Results written to /home/jenkins/workspace/Test_openjdk19_j9_sanity.openjdk_ppc64_aix_testList_0/aqa-tests/TKG/output_16809833339768/jdk_lang_0/work
[2023-04-08T21:11:24.420Z] Error: Some tests failed or other problems occurred.
[2023-04-08T21:11:24.420Z] -----------------------------------
[2023-04-08T21:11:24.420Z] jdk_lang_0_FAILED

50x internal grinder

pshipton commented 1 year ago

@JasonFengJ9 we are ignoring Skynet.java hangs in jdk_lang_0 on any platform, as this is a known problem.

JasonFengJ9 commented 1 year ago

JDK19 x86-64_windows internal build(win11x86-rt3-1)

[2023-04-08T17:50:53.504Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode650
[2023-04-08T17:50:53.915Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:-UseCompressedOops 

[2023-04-08T20:29:35.899Z] TEST: java/lang/Thread/virtual/stress/Skynet.java#id0

[2023-04-08T20:29:35.899Z] STDERR:
[2023-04-08T20:29:35.899Z] JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/04/08 12:39:43 - please wait.
[2023-04-08T20:29:35.899Z] JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/04/08 12:39:43 - please wait.
[2023-04-08T20:29:35.899Z] JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/04/08 12:39:43 - please wait.
[2023-04-08T20:29:35.899Z] JVMDUMP055I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError", exception "Java heap space" at 2023/04/08 12:39:43 - please wait.

[2023-04-08T20:29:35.900Z] Exception in thread "" JVMDUMP010I Snap dump written to {nothing to snap}
[2023-04-08T20:29:35.900Z] JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
[2023-04-08T20:29:35.900Z] Exception in thread "Attach API wait loop" java/lang/OutOfMemoryError: Java heap space
[2023-04-08T20:29:35.900Z]  at java/lang/Throwable.printStackTrace (java.base@19.0.2/Throwable.java:554)
[2023-04-08T20:29:35.900Z]  at java/lang/Throwable.printStackTraceHelper (java.base@19.0.2/Throwable.java:338)
[2023-04-08T20:29:35.900Z]  at java/lang/Throwable.printStackTrace (java.base@19.0.2/Throwable.java:307)
[2023-04-08T20:29:35.900Z]  at java/lang/Throwable.printStackTrace (java.base@19.0.2/Throwable.java:250)

[2023-04-08T20:46:11.348Z] Test results: passed: 860; error: 1
[2023-04-08T20:46:46.202Z] Report written to C:\Users\jenkins\workspace\Test_openjdk19_j9_sanity.openjdk_x86-64_windows_testList_1\aqa-tests\TKG\output_1680976237304\jdk_lang_1\report\html\report.html
[2023-04-08T20:46:46.202Z] Results written to C:\Users\jenkins\workspace\Test_openjdk19_j9_sanity.openjdk_x86-64_windows_testList_1\aqa-tests\TKG\output_1680976237304\jdk_lang_1\work
[2023-04-08T20:46:46.202Z] Error: Some tests failed or other problems occurred.
[2023-04-08T20:46:46.202Z] -----------------------------------
[2023-04-08T20:46:46.202Z] jdk_lang_1_FAILED

50x internal grinder - 2 failures