eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

OpenJDK java/lang/Thread/virtual/stress/TimedGet Invalid JIT return address #19249

Open pshipton opened 6 months ago

pshipton commented 6 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Release_testList_0/16/ - p8-java1-ibm07 jdk_lang_j9_0 java/lang/Thread/virtual/stress/TimedGet.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Release_testList_0/16/openjdk_test_output.tar.gz

18:33:51  STDOUT:
18:33:51  STDERR:
18:33:51  
18:33:51  
18:33:51  *** Invalid JIT return address 00000100114ABE44 in 0000010023C8BA70
18:33:51  
18:33:51  22:19:06.319 0x10023c92e00    j9vm.249    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK21_ppc64_aix_Release/openj9/runtime/vm/swalk.c:1629: ((0 ))
pshipton commented 6 months ago

20x x 5 grinder on single test https://openj9-jenkins.osuosl.org/job/Grinder/3426/ - passed

1x x 8 grinder on jdk_lang_j9_0 https://openj9-jenkins.osuosl.org/job/Grinder/3427/ - passed 1x x 8 grinder on jdk_lang_j9_0 https://openj9-jenkins.osuosl.org/job/Grinder/3432/ - 1 failure in java/lang/Thread/virtual/stress/Skynet.java#default

hzongaro commented 6 months ago

@BradleyWood, may I ask you to take a look at this failure? It's targeted for the 0.44 release, so it's high priority.

pshipton commented 6 months ago

It's targeted for the 0.44 release, so it's high priority.

I added it while we investigate since the failure occurred in a release build. If the frequency of the failure is low, we can move it out.

BradleyWood commented 6 months ago

@pshipton Has there been any failures on platforms other than PPC AIX?

pshipton commented 6 months ago

Not recently. There is also the closed issue https://github.com/eclipse-openj9/openj9/issues/17163

pshipton commented 6 months ago

@tajila fyi

pshipton commented 6 months ago

Perhaps related to https://github.com/eclipse-openj9/openj9/issues/18910

BradleyWood commented 6 months ago

No success yet on reproducing the problem locally

pshipton commented 5 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/179 - p8-java1-ibm02 jdk_lang_j9_0 java/lang/Thread/virtual/stress/Skynet.java#default

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/179/openjdk_test_output.tar.gz

20:37:28  *** Invalid JIT return address 00000100114AE880 in 00000100236FBA70
20:37:28  
20:37:28  00:22:45.178 0x10023703100    j9vm.249    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK21_ppc64_aix_Nightly/openj9/runtime/vm/swalk.c:1629: ((0 ))
pshipton commented 5 months ago

Trying a 5x x 5 jdk_lang_j9_0 internal grinder - passed, all machines are 7.2 p8 or p9 4x x 5 on 7.1 machines grinder - no 7.1 machines available for test

pshipton commented 5 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk22_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/44 jdk_lang_j9_0 java/lang/Thread/virtual/stress/TimedGet.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk22_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/44/openjdk_test_output.tar.gz

JamesKingdon commented 2 months ago

Matching symptoms have been reported by a customer (on x86)

pshipton commented 1 month ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_OpenJDK21_testList_0/28

pshipton commented 1 month ago

Added the userRaised label due to https://github.com/eclipse-openj9/openj9/issues/19249#issuecomment-2221566260

vij-singh commented 1 week ago

@BradleyWood Any new updates to this one?

pshipton commented 1 week ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/289

BradleyWood commented 1 week ago

This issue looks specific to AIX PPC. @JamesKingdon If you have any details that might indicate the customer issue is caused by the same problem as this issue, please let me know. Otherwise, I would like to ask @zl-wang to assign this to someone on the power team.

JamesKingdon commented 1 week ago

Hi Brad, I'm going to have to start putting internal case numbers on comments like the one above. I'm currently not able to locate the case that prompted that comment.

vij-singh commented 1 week ago

@zl-wang Hi Julian - could you assign someone to work on this one? (0.48 target)

zl-wang commented 1 week ago

@rmnattas could you take this up?

rmnattas commented 3 days ago

Looking at the walkState, it seem that the J9VMThread is not 256-byte aligned, which is unusual but seem to be valid when walking an unmounted virtual thread where the J9VMThread can be allocated on the stack, within the GC native stack in this case.

In JDK19+ we sometimes stack allocate a j9vmthread to walk a continuation list, so that would be an exception.

*** Invalid JIT return address 0000010011B341EC in 0000010024004A80

(kca) s J9StackWalkState 0x0000010024004A80
J9StackWalkState (696 bytes)
                   struct J9StackWalkState*  previous = 0x0000010024004b30 (offset: 0)
                       struct J9VMThread*  walkThread = 0x0000010024003ec0 (offset: 8)
                             struct J9JavaVM*  javaVM = 0x0000010010131e10 (offset: 16)

3XMTHREADINFO      "GC Worker" J9VMThread:0x0000010023F63400, omrthread_t:0x0000010023E040D0, java/lang/Thread:0x0A00000001108B38, state:CW, prio=5
(native stack address range from:0x0000010023F80000, to:0x0000010024005888, size:0x85888)
...
4XENATIVESTACK               invalidJITReturnAddress+0xf4 (0x090000001A2EF578 [libj9vm29.so+0x108578])
4XENATIVESTACK               jitWalkStackFrames+0x260 (0x090000001AC7CD64 [libj9jit29.so+0x15cd64])
4XENATIVESTACK               walkStackFrames+0x4d4 (0x090000001A2ED058 [libj9vm29.so+0x106058])
4XENATIVESTACK               walkContinuationStackFrames+0x748 (0x090000001A392A8C [libj9vm29.so+0x1aba8c])
4XENATIVESTACK               _ZN28GC_VMThreadStackSlotIterator21scanContinuationSlotsEP10J9VMThreadP8J9ObjectPvPFvP8J9JavaVMPS3_S4_P16J9StackWalkStatePKvEbb+0xd8
4XENATIVESTACK               _ZN22MM_GlobalMarkingScheme10scanObjectEP19MM_EnvironmentVLHGCP8J9ObjectNS_10ScanReasonE+0x42c (0x090000001CB17830 [libj9gc_full29.so+0x32f830])
rmnattas commented 2 days ago

The stack trace of the unmounted virtual thread is all JITed:

0x0000010011b3aadc {java/lang/VirtualThread.park} JIT [0x100315fbeb0]
0x000001001192c7a4 {java/util/concurrent/LinkedTransferQueue$DualNode.await} JIT [0x100315fbfb0]
0x0000010011b2f760 {java/util/concurrent/SynchronousQueue$Transferer.xferLifo} JIT [0x100315fc0e0]
0x0000010011b2f068 {java/util/concurrent/SynchronousQueue.xfer} JIT [0x100315fc1a0]
0x0000010011b34790 {java/util/concurrent/SynchronousQueue.take} JIT [0x100315fc1f0]
0x000001001192b2fc {Skynet$Channel.receive} JIT [0x100315fc280]
0x00000100117458fc {Skynet.skynet} JIT [0x100315fc2b0]
0x0000010011b341ec {Skynet.lambda$skynet$1} JIT [0x100315fc320]      <-- walkState->pc / Invalid JIT return address
0x0000010011b34044 {Skynet$$Lambda/0x00000000243074b8.run} JIT [0x100315fc370]
0x0000010011742834 {java/lang/Thread.runWith} JIT [0x100315fc3a0]
0x0000010011931b9c {java/lang/VirtualThread.run} JIT [0x100315fc450]
0x0000010011b337e0 {jdk/internal/vm/Continuation.enter} JIT [0x100315fc520]
0x090000001ac61334 {libj9jit29.so}{} [0x100315fbeb0]

It shouldn't reach the bytecoded frame after the jitinfo loop causing invalidJITReturnAddress https://github.com/eclipse-openj9/openj9/blob/v0.48.0-release/runtime/codert_vm/jswalk.c#L263-L280

But the walkState->jitInfo is NULL

(kca) s J9StackWalkState. jitInfo 0x0000010024004A80
J9StackWalkState (696 bytes)
                       struct J9JITExceptionTable*  jitInfo = 0x0000000000000000 (offset: 120)
zl-wang commented 2 days ago

@tajila @dmitripivkine see the virtual-thread's stack back trace above: there is no interpreter frame at all ... every frame is with JIT-ed code.

@rmnattas i am wondering if there were code changes in JIT metadata look-up (jitGetMapsFromPC and jitGetExceptionTable etc) recently ... causing jitInfo NULL.