eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

SVM assert triggered in jdk_lang_0 with JITServer AOT cache: compileRomClass and currentRomClass should not be different! #20071

Open cjjdespres opened 2 months ago

cjjdespres commented 2 months ago

I observed this at a client JVM when running jdk_lang_0 with a manually-started JITServer on the side, using EXTRA_OPTIONS to enable the JITServer AOT cache on the client. This is a non-fatal SVM assert. Console log:

Assertion failed at /home/despresc/dev/testing/openj9-openjdk-jdk21/openj9/runtime/compiler/runtime/RelocationRecord.cpp:2837: false
VMState: 0x0005ffff
    compileRomClass and currentRomClass should not be different!
compiling SymbolicDescTest.testSymbolicDesc(Ljava/lang/constant/ConstantDesc;)V at level: warm

Type=Unhandled trap vmState=0x0005ffff
J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=fffffffa
Handler1=00007F45F1D090C0 Handler2=00007F45F1871B70
RDI=0000000000000002 RSI=00007F45D38084D0 RAX=0000000000000000 RBX=0000000000000005
RCX=00007F45F36E3BBF RDX=0000000000000000 R8=0000000000000000 R9=00007F45D38084D0
R10=0000000000000008 R11=0000000000000246 R12=000000000059E370 R13=0000000000000001
R14=00007F45D3823650 R15=00007F45481EBC60
RIP=00007F45F36E3BBF GS=0000 FS=0000 RSP=00007F45D38084D0
EFlags=0000000000000246 CS=0033 RBP=00007F4548237CB0 ERR=0000000000000000
TRAPNO=0000000000000000 OLDMASK=0000000000000000 CR2=0000000000000000
xmm0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1=0000003000000020 (f: 32.000000, d: 1.018558e-312)
xmm2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm4=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7=00007f45d380cfb0 (f: 3548434432.000000, d: 6.913850e-310)
xmm8=0405060700010203 (f: 66051.000000, d: 2.696622e-289)
xmm9=d3858bb20898183c (f: 144185408.000000, d: -2.247134e+94)
xmm10=38ddb1d4bfeb07aa (f: 3219851264.000000, d: 8.935909e-35)
xmm11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/lib64/libpthread.so.0
Module_base_address=00007F45F36D1000 Symbol=raise
Symbol_address=00007F45F36E3AB0

Method_being_compiled=SymbolicDescTest.testSymbolicDesc(Ljava/lang/constant/ConstantDesc;)V
Target=2_90_20240802_000000 (Linux 4.18.0-553.8.1.el8_10.x86_64)
CPU=amd64 (8 logical CPUs) (0x7c7919000 RAM)
----------- Stack Backtrace -----------
raise+0x10f (0x00007F45F36E3BBF [libpthread.so.0+0x12bbf])
_ZN2TR4trapEv+0x47 (0x00007F45DE622D6D [libj9jit29.so+0x5a2d6d])
_ZN2TR15fatal_assertionEPKciS1_S1_z+0x0 (0x00007F45DE622F9C [libj9jit29.so+0x5a2f9c])
_ZN2TR27fatal_assertion_with_detailERKNS_16AssertionContextEPKciS4_S4_z+0x0 (0x00007F45DE623019 [libj9jit29.so+0x5a3019])
_ZN32TR_RelocationRecordInlinedMethod16inlinedSiteValidEP20TR_RelocationRuntimeP19TR_RelocationTargetPP20TR_OpaqueMethodBlock+0x4b4 (0x00007F45DE4F29B6 [libj9jit29.so+0x4729b6])
_ZN32TR_RelocationRecordInlinedMethod18preparePrivateDataEP20TR_RelocationRuntimeP19TR_RelocationTarget+0x27 (0x00007F45DE4F2365 [libj9jit29.so+0x472365])
_ZN27TR_RelocationRecordNopGuard18preparePrivateDataEP20TR_RelocationRuntimeP19TR_RelocationTarget+0x18 (0x00007F45DE4F248C [libj9jit29.so+0x47248c])
_ZN24TR_RelocationRecordGroup16handleRelocationEP20TR_RelocationRuntimeP19TR_RelocationTargetP19TR_RelocationRecordPh+0x88 (0x00007F45DE4E7FF8 [libj9jit29.so+0x467ff8])
_ZN24TR_RelocationRecordGroup16applyRelocationsEP20TR_RelocationRuntimeP19TR_RelocationTargetPh+0xb8 (0x00007F45DE4E8F40 [libj9jit29.so+0x468f40])
_ZN20TR_RelocationRuntime22relocateAOTCodeAndDataEPhS0_S0_S0_+0x2d2 (0x00007F45DE4F4F68 [libj9jit29.so+0x474f68])
_ZN20TR_RelocationRuntime29prepareRelocateAOTCodeAndDataEP10J9VMThreadP11TR_FrontEndPN2TR9CodeCacheEPK20J9JITDataCacheHeaderP8J9MethodbPNS4_7OptionsEPNS4_11CompilationEP17TR_ResolvedMethodPhP16TR_J9SharedCache+0x725 (0x00007F45DE4F5B29 [libj9jit29.so+0x475b29])
_ZL20remoteCompilationEndP10J9VMThreadPN2TR11CompilationEP17TR_ResolvedMethodP8J9MethodPNS1_28CompilationInfoPerThreadBaseERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESH_+0x454 (0x00007F45DE1F5CCE [libj9jit29.so+0x175cce])
_Z13remoteCompileP10J9VMThreadPN2TR11CompilationEP17TR_ResolvedMethodP8J9MethodRNS1_24IlGeneratorMethodDetailsEPNS1_28CompilationInfoPerThreadBaseE.localalias.0+0x1c41 (0x00007F45DE2026D9 [libj9jit29.so+0x1826d9])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x969 (0x00007F45DE1C7687 [libj9jit29.so+0x147687])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0xa29 (0x00007F45DE1C88BF [libj9jit29.so+0x1488bf])
omrsig_protect+0x2a7 (0x00007F45F1872957 [libj9prt29.so+0x28957])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x5be (0x00007F45DE1C5C5E [libj9jit29.so+0x145c5e])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x1b4 (0x00007F45DE1C619C [libj9jit29.so+0x14619c])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x15a (0x00007F45DE1C488E [libj9jit29.so+0x14488e])
_ZN2TR24CompilationInfoPerThread3runEv+0x31 (0x00007F45DE1C4FEF [libj9jit29.so+0x144fef])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x93 (0x00007F45DE1C50EA [libj9jit29.so+0x1450ea])
omrsig_protect+0x2a7 (0x00007F45F1872957 [libj9prt29.so+0x28957])
_Z21compilationThreadProcPv+0x1bc (0x00007F45DE1C54E7 [libj9jit29.so+0x1454e7])
thread_wrapper+0x162 (0x00007F45F163DF12 [libj9thr29.so+0x9f12])
start_thread+0xea (0x00007F45F36D91CA [libpthread.so.0+0x81ca])
clone+0x43 (0x00007F45F31308D3 [libc.so.6+0x398d3])

This was a JITServer AOT cache method received at the client. This shouldn't be a functional issue - when running with non-fatal SVM asserts (the default) we simply abort the relocation if this happens. It should still be addressed, of course.

cjjdespres commented 2 months ago

Some more details:

From the JIT dump, the method in question has the byte codes:

        0, JBaload0
        1, JBiconst0
        2, JBinvokestatic          15
        5, JBreturn0

and apparently zero instructions. Also, the headers of currentRomClass and compileRomClass appear to be identical, except for:

currentRomClass:
  romSize = 3008
  intermediateClassDataLength = 3008
compileRomClass:
  romSize = 3176
  intermediateClassDataLength = 3176

The addresses of the ROM classes are of course different. The name of both ROM classes is SymbolicDescTest.

In JITServerHelpers::packROMClass, we strip out the intermediateClassData completely from a ROM class when packing the ROM class (to send to the server, or to calculate its hash) because it is "not used by the JIT", according to a comment in that method. The romSize and intermediate class data fields are also fixed up. The ROM classes being entirely intermediate class data is consistent with the "compiled" version of the method that caused the assert having no instructions.

There is the following in the java core:

3CLTEXTCLASS                    SymbolicDescTest(0x0000000000559900)

3CLTEXTCLASS                    SymbolicDescTest(0x000000000059E100)

and the addresses of the ROM classes associated to these RAM classes match those of compileRomClass and currentRomClass, respectively.

So, the deserializer happened to resolve a particular class record to that first J9Class *. The SVM looked up that class's J9ROMClass *, and that became compileRomClass. The SVM also used different records to look up what the currentMethod ought to be, and found that its defining class was that second SymbolicDescTest instance, and so had a different underlying ROM class.

@dsouzai Maybe this particular relocation should succeed? We retrieve the currentMethod here:

https://github.com/eclipse-openj9/openj9/blob/30bbd4efd9d652532d0262d8899c73d452725913/runtime/compiler/runtime/RelocationRecord.cpp#L2790

and the assert that failed is just down from there. The deserializer will guarantee that the compileRomClass has equal name, packed length, and packed hash to the ROM class at compile time, and it should therefore always be equal to currentRomClass in those properties (I believe). So it seems plausible to me that if we are relocating a method that was deserialized, we should be able to skip this pointer equality check. If not, we can just turn off the assert in that circumstance.

dsouzai commented 2 months ago

In a local AOT compilation, the reason for this check is that although compileRomClass and currentRomClass are acquired using different means, they were the same value in the compile run:

https://github.com/eclipse-openj9/openj9/blob/84e2c9d1715c56f45e3a9bebbbbf3b88673b06c6/runtime/compiler/codegen/J9AheadOfTimeCompile.cpp#L545 https://github.com/eclipse-openj9/openj9/blob/84e2c9d1715c56f45e3a9bebbbbf3b88673b06c6/runtime/compiler/codegen/J9AheadOfTimeCompile.cpp#L551 https://github.com/eclipse-openj9/openj9/blob/84e2c9d1715c56f45e3a9bebbbbf3b88673b06c6/runtime/compiler/codegen/J9AheadOfTimeCompile.cpp#L572 https://github.com/eclipse-openj9/openj9/blob/84e2c9d1715c56f45e3a9bebbbbf3b88673b06c6/runtime/compiler/codegen/J9AheadOfTimeCompile.cpp#L579

Therefore, on load when we get use that info to get the rom class: https://github.com/eclipse-openj9/openj9/blob/84e2c9d1715c56f45e3a9bebbbbf3b88673b06c6/runtime/compiler/runtime/RelocationRecord.cpp#L2790 https://github.com/eclipse-openj9/openj9/blob/84e2c9d1715c56f45e3a9bebbbbf3b88673b06c6/runtime/compiler/runtime/RelocationRecord.cpp#L2819-L2820

they should get the same rom class.

Now in the case of JITServer+AOTCache, during relocation, is it possible for there to be two different ROMClass pointers that are actually the same ROMClass on the client?

cjjdespres commented 2 months ago

Now in the case of JITServer+AOTCache, during relocation, is it possible for there to be two different ROMClass pointers that are actually the same ROMClass on the client?

Yes, this can happen. The checks during deserialization will guarantee that compileRomClass has the same size and hash as what the ROM class had at compile time after packing, which inlines all UTF8 strings and discards intermediate class data and debug info. So compileRomClass should be equivalent to currentRomClass from a JIT perspective under SVM, but may not be equal to it.

This particular assert was triggered because there were two ROM classes in the SCC that were equal in content, except that they had slightly different intermediate class data. The class the deserializer cached for this offset happened to be the "wrong" one, but it was still equivalent to the "right" one, the one that the relo runtime was expecting.

dsouzai commented 2 months ago

This particular assert was triggered because there were two ROM classes in the SCC that were equal in content, except that they had slightly different intermediate class data. The class the deserializer cached for this offset happened to be the "wrong" one, but it was still equivalent to the "right" one, the one that the relo runtime was expecting.

When you say two ROM classes in the SCC, do you mean in the client's SCC? If so, wouldn't they actually be two different classes?

cjjdespres commented 2 months ago

Two distinct ROM classes in the client's SCC. They just happen to have equal packed hash. Since JITServerHelpers::packROMClass() seems to be of the opinion that they're equivalent from the perspective of the JIT compiler, I thought that that might be good enough to pass validation here.

The answer currently in the code is no, that's not good enough to pass validation. It's just that the current AOT cache hashing scheme and serialization record structure isn't strong enough to guarantee that after deserialization, when the ROM class offsets have been adjusted in the method's relocation records, that compileRomClass will be equal (in the sense of pointer equality) to currentRomClass, only that they will have equal packed hashes. That happens to conflict with this SVM assert.

dsouzai commented 2 months ago

Two distinct ROM classes in the client's SCC. They just happen to have equal packed hash. Since JITServerHelpers::packROMClass() seems to be of the opinion that they're equivalent from the perspective of the JIT compiler, I thought that that might be good enough to pass validation here.

My initial feeling is that if they are distinct classes in the client's SCC, then JITServerHelpers::packROMClass() should also be able to distinguish them. However, I do agree that if JITServerHelpers::packROMClass() thinks they're equivalent, then it should be consistent on the load run as well. That said, I think the comparison between compileRomClass and currentRomClass should still exist, it's just that if they aren't equal, then under JITServer+AOTCache, we do an additional check to see if their packed hash are equal.

@hangshao0, when the SCC has two ROMClasses that are basically the same except for intermediate class data fields, what does that mean from a java class pov? How can there be two different ROMClasses with the same name? Is it that they come from different places on disk?

hangshao0 commented 2 months ago

when the SCC has two ROMClasses that are basically the same except for intermediate class data fields, what does that mean from a java class pov?

If they are exactly the same except the intermediate class data, these 2 classes will be treated as the same class in the JVM code.

How can there be two different ROMClasses with the same name? Is it that they come from different places on disk?

Yes, they could come from different places on the disk. Even the same jar on the disk can be re-compiled with newer version of classes, so there could be 2 versions of class under the same name.

dsouzai commented 2 months ago

@cjjdespres I guess given what Hang said, what JITServerHelpers::packROMClass() currently does is fine. The relo code would just need to be adjusted to check the packed hash if the two pointers don't match.