eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

Non-fatal SVM assert triggered in sanity.functional testJITServer_0: aMethod should have already been validated #20022

Open cjjdespres opened 4 weeks ago

cjjdespres commented 4 weeks ago

I observed this assert when running testJITServer_0 locally, using a debug build. Console log:

Assertion failed at /home/despresc/dev/testing/openj9-openjdk-jdk21/openj9/runtime/compiler/env/j9method.cpp:1012: isAlreadyValidated(aMethod)
VMState: 0x000534ff
    aMethod 0xc2e58 should have already been validated
compiling jdk/internal/jimage/ImageLocation.verify(Ljava/lang/String;Ljava/lang/String;Ljava/nio/ByteBuffer;ILjdk/internal/jimage/ImageStrings;)Z at level: warm

Unhandled exception
Type=Unhandled trap vmState=0x000534ff
J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=fffffffa
Handler1=00007F042AE240C0 Handler2=00007F042A98CB70
RDI=0000000000000002 RSI=00007F04076ED650 RAX=0000000000000000 RBX=0000000000000005
RCX=00007F0430975BBF RDX=0000000000000000 R8=0000000000000000 R9=00007F04076ED650
R10=0000000000000008 R11=0000000000000246 R12=00007F04076F1A28 R13=00000000000C2E58
R14=00007F03B6000000 R15=00007F041BF859B2
RIP=00007F0430975BBF GS=0000 FS=0000 RSP=00007F04076ED650
EFlags=0000000000000246 CS=0033 RBP=00007F03B600A710 ERR=0000000000000000
TRAPNO=0000000000000000 OLDMASK=0000000000000000 CR2=0000000000000000
xmm0=42656c62616e655f (f: 1634624896.000000, d: 7.361016e+11)
xmm1=00000000000000ff (f: 255.000000, d: 1.259867e-321)
xmm2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm4=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7=00007f04076f4fb0 (f: 124735408.000000, d: 6.899888e-310)
xmm8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9=1e1d0a0c030b0646 (f: 51054152.000000, d: 1.260688e-163)
xmm10=0c1d070046020d03 (f: 1174539520.000000, d: 2.533909e-250)
xmm11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/lib64/libpthread.so.0
Module_base_address=00007F0430963000 Symbol=raise
Symbol_address=00007F0430975AB0

Method_being_compiled=jdk/internal/jimage/ImageLocation.verify(Ljava/lang/String;Ljava/lang/String;Ljava/nio/ByteBuffer;ILjdk/internal/jimage/ImageStrings;)Z
Target=2_90_20240802_000000 (Linux 4.18.0-553.8.1.el8_10.x86_64)
CPU=amd64 (8 logical CPUs) (0x7c7919000 RAM)
----------- Stack Backtrace -----------
_ZN2TR4trapEv+0x47 (0x00007F041C2C8E0D [libj9jit29.so+0x5a2e0d])
_ZN2TR15fatal_assertionEPKciS1_S1_z+0x0 (0x00007F041C2C903C [libj9jit29.so+0x5a303c])
_ZN2TR27fatal_assertion_with_detailERKNS_16AssertionContextEPKciS4_S4_z+0x0 (0x00007F041C2C90B9 [libj9jit29.so+0x5a30b9])
_ZN30TR_ResolvedRelocatableJ9MethodC1EP20TR_OpaqueMethodBlockP11TR_FrontEndP9TR_MemoryP17TR_ResolvedMethodj+0x18c (0x00007F041BF72038 [libj9jit29.so+0x24c038])
_ZN11TR_J9VMBase33createResolvedMethodWithSignatureEP9TR_MemoryP20TR_OpaqueMethodBlockP19TR_OpaqueClassBlockPciP17TR_ResolvedMethodj+0x8e (0x00007F041BF72186 [libj9jit29.so+0x24c186])
_ZN11TR_J9VMBase20createResolvedMethodEP9TR_MemoryP20TR_OpaqueMethodBlockP17TR_ResolvedMethodP19TR_OpaqueClassBlock+0x25 (0x00007F041BF6E627 [libj9jit29.so+0x248627])
_ZN3OMR20SymbolReferenceTable20methodSymRefFromNameEPN2TR20ResolvedMethodSymbolEPKcS5_S5_NS_12MethodSymbol5KindsEi+0x3c2 (0x00007F041C2645B0 [libj9jit29.so+0x53e5b0])
_ZN27TR_StringBuilderTransformer14performOnBlockEPN2TR5BlockE+0x5ba (0x00007F041C142FD2 [libj9jit29.so+0x41cfd2])
_ZN27TR_StringBuilderTransformer7performEv+0xaa (0x00007F041C141BB2 [libj9jit29.so+0x41bbb2])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias.4+0x2572 (0x00007F041C3F8B12 [libj9jit29.so+0x6d2b12])
_ZN3OMR9Optimizer8optimizeEv+0x589 (0x00007F041C3F99FD [libj9jit29.so+0x6d39fd])
_ZN3OMR11Compilation20performOptimizationsEv+0x3d (0x00007F041C267341 [libj9jit29.so+0x541341])
_ZN3OMR11Compilation7compileEv+0x825 (0x00007F041C26BCC9 [libj9jit29.so+0x545cc9])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0xa4e (0x00007F041BE6D78C [libj9jit29.so+0x14778c])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0xa29 (0x00007F041BE6E8DF [libj9jit29.so+0x1488df])
omrsig_protect+0x2a7 (0x00007F042A98D957 [libj9prt29.so+0x28957])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x5be (0x00007F041BE6BC7E [libj9jit29.so+0x145c7e])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x1b4 (0x00007F041BE6C1BC [libj9jit29.so+0x1461bc])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x15a (0x00007F041BE6A8AE [libj9jit29.so+0x1448ae])
_ZN2TR24CompilationInfoPerThread3runEv+0x31 (0x00007F041BE6B00F [libj9jit29.so+0x14500f])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x93 (0x00007F041BE6B10A [libj9jit29.so+0x14510a])
omrsig_protect+0x2a7 (0x00007F042A98D957 [libj9prt29.so+0x28957])
_Z21compilationThreadProcPv+0x1bc (0x00007F041BE6B507 [libj9jit29.so+0x145507])
thread_wrapper+0x162 (0x00007F042A758F12 [libj9thr29.so+0x9f12])
start_thread+0xea (0x00007F043096B1CA [libpthread.so.0+0x81ca])
clone+0x43 (0x00007F04303C28D3 [libc.so.6+0x398d3])

Looking at the core that was produced, this appears to have been a compilation that was purely local. Actually, it looks like all the test components passed, and from the command line string in the java core it looks like the crash happened in the java process that was responsible for starting the test components (which wasn't using JITServer) and not any of the child processes (which were using JITServer). So this is unlikely to be a JITServer issue.

cjjdespres commented 4 weeks ago

@dsouzai I happened to encounter this assert again while testing - I think we might have discussed it briefly a while ago. I only ran this particular test once, so I'm not sure how reproducible the failure is.

cjjdespres commented 4 weeks ago

Somewhat hilariously, I tried running !whatis 0xc2e58 in jdmpview just to see what it would say, and it appears that jdmpview also crashed with exactly the assert aMethod 0xc2e58 should have already been validated, though during the compilation of a different method.

dsouzai commented 4 weeks ago

I ran into this while working on AOT MH support; you need this patch:

diff --git a/runtime/compiler/env/VMJ9.cpp b/runtime/compiler/env/VMJ9.cpp
index b6c049e79c..733b502e99 100644
--- a/runtime/compiler/env/VMJ9.cpp
+++ b/runtime/compiler/env/VMJ9.cpp
@@ -8887,7 +8887,7 @@ TR_J9SharedCacheVM::isClassLibraryMethod(TR_OpaqueMethodBlock *method, bool vett
    }

 TR_OpaqueMethodBlock *
-TR_J9SharedCacheVM::getMethodFromClass(TR_OpaqueClassBlock * methodClass, char * methodName, char * signature, TR_OpaqueClassBlock *callingClass)
+TR_J9SharedCacheVM::getMethodFromClass(TR_OpaqueClassBlock * methodClass, const char * methodName, const char * signature, TR_OpaqueClassBlock *callingClass)
    {
    TR_OpaqueMethodBlock* omb = this->TR_J9VM::getMethodFromClass(methodClass, methodName, signature, callingClass);
    if (omb)
diff --git a/runtime/compiler/env/VMJ9.h b/runtime/compiler/env/VMJ9.h
index e57f23ff7e..65a3238de0 100644
--- a/runtime/compiler/env/VMJ9.h
+++ b/runtime/compiler/env/VMJ9.h
@@ -1666,7 +1666,7 @@ public:
    virtual bool               hasFinalizer(TR_OpaqueClassBlock * classPointer);
    virtual uintptr_t         getClassDepthAndFlagsValue(TR_OpaqueClassBlock * classPointer);
    virtual uintptr_t         getClassFlagsValue(TR_OpaqueClassBlock * classPointer);
-   virtual TR_OpaqueMethodBlock * getMethodFromClass(TR_OpaqueClassBlock *, char *, char *, TR_OpaqueClassBlock * = NULL);
+   virtual TR_OpaqueMethodBlock * getMethodFromClass(TR_OpaqueClassBlock *, const char *, const char *, TR_OpaqueClassBlock * = NULL);
    virtual bool               isPrimitiveClass(TR_OpaqueClassBlock *clazz);
    virtual TR_OpaqueClassBlock * getComponentClassFromArrayClass(TR_OpaqueClassBlock * arrayClass);
    virtual TR_OpaqueClassBlock * getArrayClassFromComponentClass(TR_OpaqueClassBlock *componentClass);

Basically OMR::SymbolReferenceTable::methodSymRefFromName is calling

TR_OpaqueMethodBlock *method = fe()->getMethodFromName(className, methodName, methodSignature);

and if you look at TR_J9VM::getMethodFromName, it calls

result = (TR_OpaqueMethodBlock *)getMethodFromClass(methodClass, methodName, signature);

but the mismatch in signature (ie the lack of const) results in the non-AOT version getting called, which doesn't add the SVM validation record for result.

dsouzai commented 4 weeks ago

Somewhat hilariously, I tried running !whatis 0xc2e58 in jdmpview just to see what it would say, and it appears that jdmpview also crashed with exactly the assert aMethod 0xc2e58 should have already been validated, though during the compilation of a different method.

Yeah it's safer to just use a diff JVM for jdmpview, or run with something like -J-Xshareclasses:none or -J-Xint if you must use the test JVM.