eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

OpenJDK java/lang/constant/CondyDescTest gpf #18526

Closed pshipton closed 10 months ago

pshipton commented 11 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/105/ jdk_lang_1 java/lang/constant/CondyDescTest.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/105/openjdk_test_output.tar.gz core, javacore in jdk_lang_1\work\java\lang\constant\methodTypeDesc\ResolveConstantDesc caused by gpf.

18:24:55  TEST RESULT: Error. Agent communication error: java.net.SocketException: Broken pipe; check console log for any additional details
18:24:55  --------------------------------------------------
18:36:15  Test results: passed: 931; error: 1
18:36:30  Report written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_17007817643027/jdk_lang_1/report/html/report.html
18:36:30  Results written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_aarch64_linux_Nightly_testList_1/aqa-tests/TKG/output_17007817643027/jdk_lang_1/work
18:36:30  Error: Some tests failed or other problems occurred.
1XMCURTHDINFO  Current thread
3XMTHREADINFO      "JIT Compilation Thread-001" J9VMThread:0x0000FFFF4C002400, omrthread_t:0x0000FFFF94114150, java/lang/Thread:0x0000FFFF733702C8, state:R, prio=10
3XMJAVALTHREAD            (java/lang/Thread getId:0x5, isDaemon:true)
3XMJAVALTHRCCL            jdk/internal/loader/ClassLoaders$AppClassLoader(0x0000FFFF732DE9F0)
3XMTHREADINFO1            (native thread ID:0x217796, native priority:0xB, native policy:UNKNOWN, vmstate:R, vm thread flags:0x00000020)
3XMTHREADINFO2            (native stack address range from:0x0000FFFF58CF0000, to:0x0000FFFF58DF0000, size:0x100000)
3XMCPUTIME               CPU usage total: 12.940751200 secs, current category="JIT"
pshipton commented 11 months ago

@knn-k fyi

knn-k commented 11 months ago

The location of SEGV is in SH_AttachedDataManager::SH_AttachedDataResourceDescriptor(), trying to store updateAtOffset to memory.

https://github.com/eclipse-openj9/openj9/blob/70e835eb2ee6d4ae5c975646d836dd2506d971c8/runtime/shared_common/AttachedDataManager.hpp#L125

   0xffff933eae60:      stp     x29, x30, [sp, #-32]!
   0xffff933eae64:      adrp    x4, 0xffff9344f000
   0xffff933eae68:      mov     x0, x1
   0xffff933eae6c:      mov     x29, sp
   0xffff933eae70:      ldr     x4, [x4, #3144]
   0xffff933eae74:      str     x19, [sp, #16]
   0xffff933eae78:      add     x19, x1, #0x8 // x1: cacheItem (arg), x19: adwInCache
   0xffff933eae7c:      str     w2, [x19, #16] // SEGV here, x19=0xFFFF31E0D928, InaccessibleAddress=0xFFFF31E0D938, w2: updateAtOffset (arg)
   0xffff933eae80:      ldr     x1, [x4]
   0xffff933eae84:      and     x1, x1, #0xfffffffffffffffd
   0xffff933eae88:      cmp     x1, #0x1
   0xffff933eae8c:      b.eq    0xffff933eaec0  // b.none

The address is in the middle of the memory-mapped SCC:

(gdb) info proc mappings
...
      0xffff21400000     0xffff34000000 0x12c00000        0x0 /home/jenkins/.cache/javasharedresources/C290M21F2A64P_sharedcc_jenkins_G43L00
...

(gdb) x/20x 0x0FFFF31E0D920
0xffff31e0d920: 0x00000024      0x0007000b      0x00000000      0x005b95b0
0xffff31e0d930: 0x00000004      0x00010002      0xffffffff      0x07d00800 // Trying to overwrite 0xffffffff at 0xffff31e0d938
0xffff31e0d940: 0x00000000      0x00000028      0x0000004c      0x0007000b
0xffff31e0d950: 0x00000000      0x00360dd0      0x00000030      0x00000001
0xffff31e0d960: 0xffffffff      0x00601dba      0x0100180c      0x000005cf

From the javacore file:

2SCLTEXTRCS            ROMClass start address                    = 0x0000FFFF21460000
2SCLTEXTRCE            ROMClass end address                      = 0x0000FFFF23C61858
2SCLTEXTMSA            Metadata start address                    = 0x0000FFFF31AAB0B0
2SCLTEXTCEA            Cache end address                         = 0x0000FFFF34000000
knn-k commented 11 months ago

No failures in two Grinder jobs for jdk_lang_1 (20 runs in total): https://openj9-jenkins.osuosl.org/job/Grinder/3078/ https://openj9-jenkins.osuosl.org/job/Grinder/3080/

knn-k commented 11 months ago

The VMSTATE is 0x8000c, which means J9VMSTATE_ATTACHEDDATA_UPDATE.

The Snap file shows the following trace:

23:24:37.730749589  0x0000ffff4c002400 j9shr.646           Entry      >M reset: Resetting Manager of TYPE_ATTACHED_DATA
23:24:37.730750188  0x0000ffff4c002400 j9shr.637           Entry      >M tearDownHashTable: Entering for Manager of TYPE
_ATTACHED_DATA
23:24:37.730822428  0x0000ffff4c002400 j9shr.638           Exit       <M tearDownHashTable: Exit
23:24:37.730822508  0x0000ffff4c002400 j9shr.635           Entry      >M initializeHashTable: Entering for Manager of TY
PE_ATTACHED_DATA
23:24:37.730822748  0x0000ffff4c002400 j9shr.562           Entry      >RRM localHashTableCreate: Creating new hashtable
with 63014 initial entries
23:24:37.731030467  0x0000ffff4c002400 j9shr.563           Exit       <RRM localHashTableCreate: Created hashtable at ad
dress 0xffff941518e0
23:24:37.731030667  0x0000ffff4c002400 j9shr.636           Exit       <M initializeHashTable: Exiting with 0
23:24:37.731031267  0x0000ffff4c002400 j9shr.647           Exit       <M reset: Done resetting Manager. Returning 0
23:24:37.731031787  0x0000ffff4c002400 j9shr.1021          Entry      >CC reset: Entering
23:24:37.731032627  0x0000ffff4c002400 j9shr.1069          Event       CC findStart: Setting _scan to 0xffff3280fffc
23:24:37.731034547  0x0000ffff4c002400 j9shr.1206          Entry      >CC protectMetadataArea: Entering
23:24:37.731035267  0x0000ffff4c002400 j9shr.1207          Exit       <CC protectMetadataArea: Exiting with rc=0
23:24:37.731035667  0x0000ffff4c002400 j9shr.1022          Exit       <CC reset: Exiting
23:24:37.757386286  0x0000ffff4c002400 j9jit.12            Event       JIT: Fatal Crash in the JIT while compiling java/
util/regex/Pattern$$Lambda/0x0000000000000000.is(I)Z
23:24:37.757768284  0x0000ffff4c002400 omrport.426         Entry      >omrsig_protect: Entered, fn=0xffff99462d50, fn_ar
g=0xffff58de8f88, handler=0xffff99462670, handler_arg=0xffff58de8f60, flags=0x1f9
hangshao0 commented 11 months ago

In the SH_CacheMap::updateROMClassResource(), the shared cache is locked and the metadata area is unprotected (marked as writable) first to prepare for the following write operation. But I see:

23:24:37.731034547  0x0000ffff4c002400 j9shr.1206          Entry      >CC protectMetadataArea: Entering
23:24:37.731035267  0x0000ffff4c002400 j9shr.1207          Exit       <CC protectMetadataArea: Exiting with rc=0

The metadata area has been re-protected (marked as readonly), which leads to the crash on the following write.

The above 2 trace points are from: https://github.com/eclipse-openj9/openj9/blob/b014f61aacfa6ee1e32c947f8d16768556e8043c/runtime/shared_common/CompositeCache.cpp#L479

The cache should not be unlocked in this case.

knn-k commented 11 months ago

11988 (closed) and #17974 (open) failed with the same vmState, 0x0008000c.