eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

OpenJ9 JVM crash when loading native library in Linux #13269

Open a0304 opened 3 years ago

a0304 commented 3 years ago

Hi,

We have been seeing a JVM crash when loading native code using System.loadLibrary in Linux . This crash is only visible on OpenJDK11 OpenJ9 JVMs. But the load works successfully as expected in OpenJDK 11 Hotspot JVMs.

The JVMs where we can reproduce the crash are,

_openjdk version "11.0.9" 2020-10-20 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9+11) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.23.0, JRE 11 Linux amd64-64-Bit Compressed References 20201022810 (JIT enabled, AOT enabled) OpenJ9 - 0394ef754 OMR - 582366ae5 JCL - 3b09cfd7e9 based on jdk-11.0.9+11)

_openjdk version "11.0.10" 2021-01-19 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9) Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.24.0, JRE 11 Linux amd64-64-Bit Compressed References 20210120910 (JIT enabled, AOT enabled) OpenJ9 - 345e1b09e OMR - 741e94ea8 JCL - 0a86953833 based on jdk-11.0.10+9)

_openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) Eclipse OpenJ9 VM AdoptOpenJDK-11.0.11+9 (build openj9-0.26.0, JRE 11 Linux amd64-64-Bit Compressed References 20210421975 (JIT enabled, AOT enabled) OpenJ9 - b4cc246d9 OMR - 162e6f729 JCL - 7796c80419 based on jdk-11.0.11+9)

The Hotspot JVM where the load works successfully is

openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)

Summary of the problem:

The OpenJ9 JVM running on Linux crashed with a Segmentation Fault while trying to load a .so. The application doesnt log any exceptions to the stdout or stderr and simply crashes. The JVM being tested was openjdk version "11.0.11" 2021-04-20. Also, this crash happens consistently with the above listed OpenJ9 JVMs. At the time of the crash the application was trying to load one of our native libraries which depend on the Intel proprietary reference libraries libsvml.so which in turn refers to libintlc.so.5. We tracked this by setting "LD_DEBUG=libs". We suspect that this is a bug in the OpenJ9 implementation, because this same application works successfully and the crash could not be reproduced when using an OpenJDK 11 Hotspot JVM. The Linux environment, LD_LIBRARY_PATH, application code, the native libraries, and Intel compilers were the same between the 2 tests.

Linux Environment: Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo) CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server

Trying to open the crash core dump file using GDB showed the following trace,

0 0x00007f19fb7ac54f in renameDump () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so

1 0x00007f19fb796bd1 in omrdump_create () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so

2 0x00007f19f49cf212 in doSystemDump () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so

3 0x00007f19f49cb005 in protectedDumpFunction () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so

4 0x00007f19fb798773 in omrsig_protect () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so

5 0x00007f19f49ce67b in runDumpFunction () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so

6 0x00007f19f49ce80f in runDumpAgent () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so

7 0x00007f19f49e6eab in triggerDumpAgents () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9dmp29.so

8 0x00007f19fba36f22 in generateDiagnosticFiles () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

9 0x00007f19fb798773 in omrsig_protect () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so

10 0x00007f19fba37135 in vmSignalHandler () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

11 0x00007f19fb797c3a in mainSynchSignalHandler () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so

12

13 0x00007f1a024a5357 in open_verify () from /lib64/ld-linux-x86-64.so.2

14 0x00007f1a024a5892 in open_path () from /lib64/ld-linux-x86-64.so.2

15 0x00007f1a024a8689 in _dl_map_object () from /lib64/ld-linux-x86-64.so.2

16 0x00007f1a024acb92 in openaux () from /lib64/ld-linux-x86-64.so.2

17 0x00007f1a024af714 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2

18 0x00007f1a024ad39d in _dl_map_object_deps () from /lib64/ld-linux-x86-64.so.2

19 0x00007f1a024b423b in dl_open_worker () from /lib64/ld-linux-x86-64.so.2

20 0x00007f1a024af714 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2

21 0x00007f1a024b3acb in _dl_open () from /lib64/ld-linux-x86-64.so.2

22 0x00007f1a01c59eeb in dlopen_doit () from /lib64/libdl.so.2

23 0x00007f1a024af714 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2

24 0x00007f1a01c5a4ed in _dlerror_run () from /lib64/libdl.so.2

25 0x00007f1a01c59f81 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2

26 0x00007f19fb79c178 in omrsl_open_shared_library () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so

27 0x00007f19fba7f727 in classLoaderRegisterLibrary () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

28 0x00007f19fba7fd1d in openNativeLibrary.constprop.3 () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

29 0x00007f19fba7ff59 in registerNativeLibrary () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

30 0x00007f19fba8b578 in VM_BytecodeInterpreterCompressed::run(J9VMThread*) () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

31 0x00007f19fba882f5 in bytecodeLoopCompressed () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

32 0x00007f19fbb336b2 in c_cInterpreter () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

33 0x00007f19fba1509a in runJavaThread () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

34 0x00007f19fba87af1 in javaProtectedThreadProc () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

35 0x00007f19fb798773 in omrsig_protect () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9prt29.so

36 0x00007f19fba83c6a in javaThreadProc () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9vm29.so

37 0x00007f19fb5614f6 in thread_wrapper () from /u/users/xyz/openjdk_11/11.0.11/jdk-11.0.11+9-jre/lib/default/libj9thr29.so

38 0x00007f1a02075dd5 in start_thread () from /lib64/libpthread.so.0

39 0x00007f1a01989ead in clone () from /lib64/libc.so.6

The std out message at the time of the crash was

_Unhandled exception Type=Segmentation error vmState=0x00000000 J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002 Handler1=00007F6346193380 Handler2=00007F6345EF3A10 InaccessibleAddress=00007F631BE7A5E8 RDI=00007F631BE435F0 RSI=0000000001795E60 RAX=00000000000720F0 RBX=00007F631BEB5898 RCX=0000000000000008 RDX=00007F631BE435F0 R8=00007F631BEB5AA0 R9=00007F6348C0EC60 R10=00007F631BEB54A0 R11=0000001400000004 R12=00000000000720E0 R13=00007F631BEB5AA0 R14=00007F631BEB58E0 R15=00007F631BEB59C0 RIP=00007F6348BF5357 GS=0000 FS=0000 RSP=00007F631BE795F0 EFlags=0000000000010206 CS=0033 RBP=00007F631BEB5640 ERR=0000000000000006 TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=00007F631BE7A5E8 xmm0 6370682d67786d2f (f: 1735945472.000000, d: 9.907070e+170) xmm1 732f6d63732f6836 (f: 1932486656.000000, d: 6.866786e+246) xmm2 6e696c2f6836675f (f: 1748395904.000000, d: 7.351682e+223) xmm3 2f6e69622f65646f (f: 795174016.000000, d: 3.206057e-80) xmm4 71706c6e006c7170 (f: 7106928.000000, d: 2.673645e+238) xmm5 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm6 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm9 000000003e17cee7 (f: 1041747712.000000, d: 5.146917e-315) xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm11 ca62c1d6ca62c1d6 (f: 3395469824.000000, d: -2.193092e+50) xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00) Module=/lib64/ld-linux-x86-64.so.2 Module_base_address=00007F6348BF0000 Target=2_90_20210421_975 (Linux 3.10.0-957.el7.x8664) CPU=amd64 (20 logical CPUs) (0xfb302b000 RAM) ----------- Stack Backtrace ----------- (0x00007F6348BF5357 [ld-linux-x86-64.so.2+0x5357]) (0x00007F6348BF5892 [ld-linux-x86-64.so.2+0x5892]) (0x00007F6348BF8689 [ld-linux-x86-64.so.2+0x8689]) (0x00007F6348BFCB92 [ld-linux-x86-64.so.2+0xcb92]) (0x00007F6348BFF714 [ld-linux-x86-64.so.2+0xf714]) (0x00007F6348BFD39D [ld-linux-x86-64.so.2+0xd39d]) (0x00007F6348C0423B [ld-linux-x86-64.so.2+0x1423b]) (0x00007F6348BFF714 [ld-linux-x86-64.so.2+0xf714]) (0x00007F6348C03ACB [ld-linux-x86-64.so.2+0x13acb]) (0x00007F63483A9EEB [libdl.so.2+0xeeb]) (0x00007F6348BFF714 [ld-linux-x86-64.so.2+0xf714]) (0x00007F63483AA4ED [libdl.so.2+0x14ed]) dlopen+0x31 (0x00007F63483A9F81 [libdl.so.2+0xf81]) (0x00007F6345EF8178 [libj9prt29.so+0x2e178]) (0x00007F63461DB727 [libj9vm29.so+0x86727]) (0x00007F63461DBD1D [libj9vm29.so+0x86d1d]) (0x00007F63461DBF59 [libj9vm29.so+0x86f59]) (0x00007F63461E7578 [libj9vm29.so+0x92578]) (0x00007F63461E42F5 [libj9vm29.so+0x8f2f5]) (0x00007F634628F6B2 [libj9vm29.so+0x13a6b2]) ---------------------------------------

We have made sure that the path and the LD_LIBRARY_PATH are valid and all native libraries and their references listed using the ldd are available at runtime. We followed the Mustgather guide in https://www.ibm.com/support/pages/node/344411 and got the java core dump, snap trace, etc during further tests. We can provide these files and part of the stdout and stderr of our application to the assignee of this issue.

We appreciate help from anyone who works in this area and/or who has encountered a similar issue before. Thanks !

Ashwin

JasonFengJ9 commented 3 years ago

@a0304 Could you provide the diagnosis file? In addition, a standalone testcase will be helpful if it is available.

a0304 commented 3 years ago

Thanks for looking into this issue @JasonFengJ9. I have shared a Box folder containing the diagnostics files and the .so files that caused the crash , to your email mentioned in your Git profile. Please let me know if you haven't received it and I can share it here.

The .so files were compiled in the Intel® Fortran Compiler 19.0.5. The compilers were recently updated from Intel Fortran Compiler 16. And the version 16 compiled .so(s) work successfully with the OpenJ9 JVMs. Is there a known supported Intel compiler version for OpenJ9 ?

I have been trying to put together a standalone test case but I am unable to replicate the crash when only the 2 attached libraries (libsvml.so, libintlc.so.5) are being loaded in a simple java main program. So i tried printing all the libs that get loaded using LD_DEBUG=libs and tried to load them all in the same sequence from my program. I noticed that calling a system.loadlibrary on the /jre/lib/libnet.so also causes a similar segmentation fault.

JasonFengJ9 commented 3 years ago

@a0304 I can confirm that the box folder is well received.

The .so files were compiled in the Intel® Fortran Compiler 19.0.5. The compilers were recently updated from Intel Fortran Compiler 16. And the version 16 compiled .so(s) work successfully with the OpenJ9 JVMs. Is there a known supported Intel compiler version for OpenJ9 ?

OpenJ9 build environments is documented at [1]. For JDK11 Linux x86 64 bit, it is gcc 7.5.

Did you have a chance to run the application with JDK8 or JDK16?

Edit: Please run jextract[2] or jpackcore[3] to collect libraries required for a full analysis of a core dump. jpackcore replaces jextract, which is deprecated in OpenJ9 version 0.26.0.

[1] https://www.eclipse.org/openj9/docs/openj9_support/ [2] https://www.ibm.com/docs/en/sdk-java-technology/7?topic=udv-using-jextract [3] https://www.eclipse.org/openj9/docs/tool_jextract/

a0304 commented 3 years ago

Yes, we tried this on JDK 8 (version details below) and we see similar segmentation faults. We have not tried JDK 16 yet and provide the results.

_openjdk version "1.8.0_265" OpenJDK Runtime Environment (build 1.8.0_265-b01) Eclipse OpenJ9 VM (build openj9-0.21.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20200728727 (JIT enabled, AOT enabled) OpenJ9 - 34cf4c075 OMR - 113e54219 JCL - c82ff0c20f based on jdk8u265-b01)

a0304 commented 3 years ago

I tested the application with the latest JDK16 and i can still reproduce the crash. The crash dump files and the outputs from jpackcore have been shared in the same BOX folder. You should have received an email with the file url. The jdk 16 version used was

_openjdk version "16.0.2" 2021-07-20 IBM Semeru Runtime Open Edition 16.0.2.0 (build 16.0.2+7) Eclipse OpenJ9 VM 16.0.2.0 (build openj9-0.27.0, JRE 16 Linux amd64-64-Bit Compressed References 2021072969 (JIT enabled, AOT enabled) OpenJ9 - 1851b0074 OMR - 9db1c870d JCL - 34df42439f3 based on jdk-16.0.2+7)

The native code was built with gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3). Is gcc 7.5 the minimum required version or is it the only gcc version supported ? My understand is, it is the minimum version. Can you please confirm ? -Thanks

JasonFengJ9 commented 3 years ago

The native code was built with gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3). Is gcc 7.5 the minimum required version or is it the only gcc version supported ? My understand is, it is the minimum version. Can you please confirm ?

gcc 7.5 is the JVM compiler level. I am not aware the limitation of native libraries to be loaded by JVM which are built on newer levels than JVM.

Yes, I did receive the dump files.

JasonFengJ9 commented 3 years ago

With the JDK16 core files collected via jpackcore (https://github.com/eclipse-openj9/openj9/issues/13269#issuecomment-894507381), the native stacktrace got is:

#0  0x00007fe69184f9d1 in ?? ()
#1  0x00007fe68b27bdfd in omrdump_create (portLibrary=0x7fe690657380 <j9portLibrary>, filename=0x7fe6697e0800 "/mxg-hpc/users/g6h/scm/simPRJ424_g6h/", 
    dumpType=<optimized out>, userData=<optimized out>) at ../../../../../../omr/port/unix/omrosdump.c:188
#2  0x00007fe6842ac5c2 in doSystemDump (agent=0x7fe68c02d930, label=0x7fe6697e0800 "/mxg-hpc/users/g6h/scm/simPRJ424_g6h/", context=0x7fe6697e0cc0)
    at ../../../../../openj9/runtime/rasdump/dmpagent.c:751
#3  0x00007fe6842a83b5 in protectedDumpFunction (portLibrary=portLibrary@entry=0x7fe690657380 <j9portLibrary>, userData=userData@entry=0x7fe6697e0760)
    at ../../../../../openj9/runtime/rasdump/dmpagent.c:2904
#4  0x00007fe68b27d8c3 in omrsig_protect (portLibrary=0x7fe690657380 <j9portLibrary>, fn=0x7fe6842a83a0 <protectedDumpFunction>, fn_arg=0x7fe6697e0760, 
    handler=0x7fe6842a83c0 <signalHandler>, handler_arg=0x0, flags=505, result=0x7fe6697e0758) at ../../../../../../omr/port/unix/omrsignal.c:425
#5  0x00007fe6842aba2b in runDumpFunction (agent=<optimized out>, label=0x7fe6697e0800 "/mxg-hpc/users/g6h/scm/simPRJ424_g6h/", context=<optimized out>)
    at ../../../../../openj9/runtime/rasdump/dmpagent.c:2878
#6  0x00007fe6842abbbf in runDumpAgent (vm=vm@entry=0x7fe68c015740, agent=agent@entry=0x7fe68c02d930, context=context@entry=0x7fe6697e0cc0, 
    state=state@entry=0x7fe6697e0cb8, detail=detail@entry=0x7fe6697e0d40 "", timeNow=timeNow@entry=1628277918316)
    at ../../../../../openj9/runtime/rasdump/dmpagent.c:2804
#7  0x00007fe6842c44be in triggerDumpAgents (vm=0x7fe68c015740, self=0x88a500, eventFlags=8192, eventData=<optimized out>)
    at ../../../../../openj9/runtime/rasdump/trigger.c:1046
#8  0x00007fe68b51cf72 in generateDiagnosticFiles (portLibrary=portLibrary@entry=0x7fe690657380 <j9portLibrary>, userData=userData@entry=0x7fe6697e1220)
    at ../../../../../openj9/runtime/vm/gphandle.c:1177
#9  0x00007fe68b27d8c3 in omrsig_protect (portLibrary=0x7fe690657380 <j9portLibrary>, fn=0x7fe68b51ce90 <generateDiagnosticFiles>, fn_arg=0x7fe6697e1220, 
    handler=0x7fe68b51c550 <recursiveCrashHandler>, handler_arg=0x7fe6697e11f0, flags=505, result=0x7fe6697e11e8)
    at ../../../../../../omr/port/unix/omrsignal.c:425
#10 0x00007fe68b51d185 in vmSignalHandler (portLibrary=0x7fe690657380 <j9portLibrary>, gpType=24, gpInfo=<optimized out>, userData=<optimized out>)
    at ../../../../../openj9/runtime/vm/gphandle.c:848
#11 0x00007fe68b27cd8a in mainSynchSignalHandler (signal=11, sigInfo=0x7fe6697e24b0, contextInfo=0x7fe6697e2380)
    at ../../../../../../omr/port/unix/omrsignal.c:1066
#12 <signal handler called>
#13 0x00007fe691e8b357 in ?? ()
#14 0x0000000000000000 in ?? ()

@a0304 is this what you see in local system? Note: this is different from JDK11 stacktrace as per https://github.com/eclipse-openj9/openj9/issues/13269#issue-959278720

From JDK11 snaptrace

13:31:34.253248000  0x835600 j9vm.229             Entry      >classLoaderRegisterLibrary(loader=0x7f19300b6138, name=/mxg-hpc/users/g6h/scm/simPRJ424_g6h/linux_a64/code/bin/SMAOPTnlpql, decorate=7)
13:31:34.253249000  0x835600 omrport.476          Event       omrsl_open_shared_library using mangledName /u/users/g6h/openjdk_11/11.0.11/jdk-11.0.11+9-jre/bin/java
13:31:34.253250000  0x835600 j9vm.399             Entry      >sendLifecycleEventCallback(vmStruct 0x835600, slHandle 321384784, fnName JNI_OnLoad_SMAOPTnlpql, defaultResult -1)
13:31:34.253251000  0x835600 j9vm.401             Exit       <exit sendLifecycleEventCallback result -1
13:31:34.253251000  0x835600 omrport.476          Event       omrsl_open_shared_library using mangledName /mxg-hpc/users/g6h/scm/simPRJ424_g6h/linux_a64/code/bin/libSMAOPTnlpql.so

It appears the segmentation error occurred at omrsl_open_shared_library loading libSMAOPTnlpql.so.

Couple of things to try at the local system:

  1. Run the testcase using JDK with debug images such as [1][2], please provide console segmentation error output, gdb native stacktrace, and jpackcore generated zip file;
  2. Run the testcase with -Xcheck:jni, and look for any warning messages.

[1] https://github.com/ibmruntimes/semeru11-binaries/releases/download/jdk-11.0.12%2B7_openj9-0.27.0/ibm-semeru-open-jdk_x64_linux_11.0.12_7_openj9-0.27.0.tar.gz [2] https://github.com/ibmruntimes/semeru11-binaries/releases/download/jdk-11.0.12%2B7_openj9-0.27.0/ibm-semeru-open-debugimage_x64_linux_11.0.12_7_openj9-0.27.0.tar.gz

a0304 commented 3 years ago

@JasonFengJ9, i have downloaded the tars from [1] and [2]. But i am unable to use them. i have untar'd the files and tried calling debuginfo-install on them using, sudo debuginfo-install /u/users/xxx/openjdk_11/11.0.12_debug/jdk-11.0.12+7

i see the following error,

_Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile Could not find a package for: /u/users/xxx/openjdk_11/11.0.12debug/jdk-11.0.12+7 No debuginfo packages available to install

Do i need to change something ?

JasonFengJ9 commented 3 years ago

@a0304 Usually I just overlay the JDK with debug files like cp -R jdk-11.0.12+7-debug-image/* jdk-11.0.12+7/.

a0304 commented 3 years ago

@JasonFengJ9 i have shared with you the Box folder containing the crash dump logs, gdb traces, jpackcore zip output and the sys out trace from the time of the crash. This was generated using the OpenJDK11.0.12_debugimage that you provided.

Can you please check the GDB_Trace_Full.txt in the shared .zip archive I tried calling GDB on the stack trace using the following commands

- gdb
- set solib-search-path <lib path>
- file </bin/java path>
- core file <core dump path>

The gdb traces seem to show that the .so file is being loaded from the Java installation, instead of the application lib folders. I am not sure if i am running the gdb right.

#26 0x00007f17704a6248 in omrsl_open_shared_library (portLibrary=0x7f1771972380 <j9portLibrary>, name=<optimized out>, descriptor=0x7f169c124528, flags=<optimized out>) at ../../../../../../omr/port/unix/omrsl.c:163 handle = <optimized out> openName = 0x7f16e59be1b0 "/u/users/g6h/openjdk_11/11.0.12_debug/jdk-11.0.12+7/bin/SMAOPTnlpql" mangledName = "\000\000\000\000\000\000\000\000\330D<s\027\177\000\000undefined symbol: JNI_OnLoad_SMAOPTnlpql", '\000' <repeats 16 times>, "PA<s\027\177\000\000\000\000\000\000\000\000\000\000h\367\204", '\000' <repeats 13 times>,

a0304 commented 3 years ago

@JasonFengJ9, One workaround that seems to work is to do a LD_PRELOAD of the dependencies of our .so before starting the OpenJ9 JVM. In this particular case the dependencies were the Intel libs libsvml.so and libintlc.so.5. I made sure that these dependency lib paths are available in the LD_LIBRARY_PATH of the JVM and the .so has these dependencies listed in its make file before build. Also, the Hotspot jvms work without the LD_PRELOAD with the same paths and libs.

JasonFengJ9 commented 3 years ago

This is a problem across Java 8/11/16 levels, a workaround has been identified as per https://github.com/eclipse-openj9/openj9/issues/13269#issuecomment-896948330. There is no immediate solution available for 0.28 (Java 17) release, moving it to 0.29 instead.

a0304 commented 3 years ago

Hi @JasonFengJ9, What dlopen flag does OpenJ9 JDK use in System.loadlibrary. Is it RTLD_GLOBAL ? That would help explain why LD_PRELOAD of the libs fixes the issue.

JasonFengJ9 commented 3 years ago

What dlopen flag does OpenJ9 JDK use in System.loadlibrary. Is it RTLD_GLOBAL ? That would help explain why LD_PRELOAD of the libs fixes the issue.

The actual library loading code snippet for Linux is [1]

int lazyOrNow = OMR_ARE_ALL_BITS_SET(flags, OMRPORT_SLOPEN_LAZY) ? RTLD_LAZY : RTLD_NOW;
...
handle = dlopen(openExec ? NULL : openName, lazyOrNow);

Neither RTLD_GLOBAL nor RTLD_LOCAL was specified hence RTLD_LOCAL is used by default [2].

From the snaptrace log

19:43:52.448345000  0x84ED00 omrport.476          Event       omrsl_open_shared_library using mangledName /u/users/g6h/openjdk_11/11.0.12_debug/jdk-11.0.12+7/lib/default/libSMAOPTnlpql.so
19:43:52.450538000  0x84ED00 omrport.221          Exit       <omrfile_attr failed. errorCode=-108

19:43:52.450721000  0x84ED00 omrport.476          Event       omrsl_open_shared_library using mangledName /u/users/g6h/openjdk_11/11.0.12_debug/jdk-11.0.12+7/lib/libSMAOPTnlpql.so
19:43:52.450849000  0x84ED00 omrport.221          Exit       <omrfile_attr failed. errorCode=-108

19:43:52.451044000  0x84ED00 omrport.476          Event       omrsl_open_shared_library using mangledName /u/users/g6h/openjdk_11/11.0.12_debug/jdk-11.0.12+7/lib/default/libSMAOPTnlpql.so
19:43:52.451180000  0x84ED00 omrport.221          Exit       <omrfile_attr failed. errorCode=-108

19:43:52.451327000  0x84ED00 omrport.476          Event       omrsl_open_shared_library using mangledName /u/users/g6h/openjdk_11/11.0.12_debug/jdk-11.0.12+7/lib/libSMAOPTnlpql.so
19:43:52.451453000  0x84ED00 omrport.221          Exit       <omrfile_attr failed. errorCode=-108

19:43:52.451617000  0x84ED00 omrport.476          Event       omrsl_open_shared_library using mangledName /scratch/SMAEXE_g6h/templib56973/libSMAOPTnlpql.so
19:43:52.451750000  0x84ED00 omrport.221          Exit       <omrfile_attr failed. errorCode=-108

19:43:52.451888000  0x84ED00 omrport.476          Event       omrsl_open_shared_library using mangledName /u/users/g6h/openjdk_11/11.0.12_debug/jdk-11.0.12+7/bin/libSMAOPTnlpql.so
19:43:52.452594000  0x84ED00 omrport.221          Exit       <omrfile_attr failed. errorCode=-108

Is libSMAOPTnlpql.so at /scratch/SMAEXE_g6h/templib56973? It is not in the core dump zip though.

[1] https://github.com/eclipse-openj9/openj9-omr/blob/1d8fb435675f022855948c08f08f6db66cbe38d8/port/unix/omrsl.c#L163 [2] https://man7.org/linux/man-pages/man3/dlopen.3.html

a0304 commented 3 years ago

No the libSMAOPTnlpql.so is not available in the /scratch/SMAEXE_g6h/templib56973 dir. The lib is not meant to be available there. And the actual location of the lib is available in the LD_Library_Path. Could you please provide me the full formatted snap trace dump ? Thanks

JasonFengJ9 commented 3 years ago

@a0304 Sure, is it ok to attach the text file in this issue?

a0304 commented 3 years ago

Thanks @JasonFengJ9. I have sent you an invite (as editor) to a Box dir. Can you please add the file there ?

JasonFengJ9 commented 3 years ago

@a0304 Just uploaded, please check.

a0304 commented 3 years ago

Thanks

a0304 commented 3 years ago

Hi @JasonFengJ9, I have shared a .zip file in a Box dir containing the necessary .so file and the Java code files to reproduce this crash in a standalone program. The .zip file contains a 'libs' dir which has a copy of the libsvml.so file from the latest Intel 19 U5 Fortran compiler. Please use this .so and set the LD_LIBRARY_PATH to only this dir.

Steps to reproduce,

ksh export LD_LIBRARY_PATH=/somepath/LinuxSOThreadTest/libs java OPTLoad ---> WORKS SUCCESSFULLY java OPTLoadInThread ---> CRASHES WITH MEMORY FAULT

Please let me know if you need more details.

JasonFengJ9 commented 3 years ago

@a0304 got couple of questions:

  1. Tried OpenJ9 JDK11, it does work successfully w/ java OPTLoad, and segmentation fault (core dumped) w/ java OPTLoadInThread. The native stack trace appears like

    #0  0x00007f58aac65bed in ?? () from /lib64/ld-linux-x86-64.so.2
    #1  0x00007f58aac68c05 in ?? () from /lib64/ld-linux-x86-64.so.2
    #2  0x00007f58aac755b7 in ?? () from /lib64/ld-linux-x86-64.so.2
    #3  0x00007f58aac705a4 in ?? () from /lib64/ld-linux-x86-64.so.2
    #4  0x00007f58aac74de9 in ?? () from /lib64/ld-linux-x86-64.so.2
    #5  0x00007f58aa414f09 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
    #6  0x00007f58aac705a4 in ?? () from /lib64/ld-linux-x86-64.so.2
    #7  0x00007f58aa415571 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
    #8  0x00007f58aa414fa1 in dlopen () from /lib/x86_64-linux-gnu/libdl.so.2
    #9  0x00007f58a807d248 in omrsl_open_shared_library (portLibrary=0x7f58a962f380 <j9portLibrary>, name=<optimized out>, descriptor=0x7f58380038a8, flags=<optimized out>)
    at ../../../../../../omr/port/unix/omrsl.c:163
    #10 0x00007f58a8361377 in classLoaderRegisterLibrary (voidVMThread=voidVMThread@entry=0x180a00, classLoader=classLoader@entry=0x7f58a40a3b88, 
    logicalName=logicalName@entry=0x7f58380065b0 "libsvml.so", 
    physicalName=physicalName@entry=0x7f585f5511f0 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/libsvml.so", libraryPtr=libraryPtr@entry=0x0, 
    errBuf=errBuf@entry=0x7f5838005580 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/liblibsvml.so.so: cannot open shared object file: No such file or directory", 
    bufLen=512, flags=6) at ../../../../../openj9/runtime/vm/vmbootlib.c:720
    #11 0x00007f58a83619a5 in openNativeLibrary (vm=0x7f58a4021d00, classLoader=0x7f58a40a3b88, libName=<optimized out>, 
    libraryPath=0x7f5838004ada "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs:/usr/lib64:/usr/lib", libraryPtr=0x0, userData=0x180a00, 
    errorBuffer=0x7f5838005580 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/liblibsvml.so.so: cannot open shared object file: No such file or directory", 
    bufferLength=512, openFunction=0x7f58a8360a30 <classLoaderRegisterLibrary>) at ../../../../../openj9/runtime/vm/vmbootlib.c:299
    #12 0x00007f58a8361ba9 in registerNativeLibrary (vmThread=<optimized out>, classLoader=<optimized out>, libName=<optimized out>, libraryPath=<optimized out>, 
    libraryPtr=<optimized out>, errorBuffer=<optimized out>, bufferLength=512) at ../../../../../openj9/runtime/vm/vmbootlib.c:371
    #13 0x00007f58a836d178 in VM_BytecodeInterpreterCompressed::inlClassLoaderLoadLibraryWithPath (_pc=<optimized out>, _sp=<optimized out>, this=<optimized out>)
    at ../../../../../openj9/runtime/vm/BytecodeInterpreter.hpp:4612
    #14 VM_BytecodeInterpreterCompressed::run (this=0x7f585f5519f0, vmThread=0x1795e60) at ../../../../../openj9/runtime/vm/BytecodeInterpreter.hpp:9888
    #15 0x00007f58a8369ef5 in bytecodeLoopCompressed (currentThread=<optimized out>) at ../../../../../openj9/runtime/vm/BytecodeInterpreter.inc:112
    #16 0x00007f58a8415192 in c_cInterpreter ()
    at /home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-x64-openj9/workspace/build/src/build/linux-x86_64-normal-server-release/vm/runtime/vm/xcinterp.s:160
    #17 0x00007f58a82f65da in runJavaThread (currentThread=0xfff06338, currentThread@entry=0x180a00) at ../../../../../openj9/runtime/vm/callin.cpp:648
    #18 0x00007f58a83696ed in javaProtectedThreadProc (portLibrary=portLibrary@entry=0x7f58a962f380 <j9portLibrary>, entryarg=entryarg@entry=0x180a00)
    at ../../../../../openj9/runtime/vm/vmthread.c:2088
    #19 0x00007f58a8079843 in omrsig_protect (portLibrary=0x7f58a962f380 <j9portLibrary>, fn=0x7f58a8369630 <javaProtectedThreadProc>, fn_arg=0x180a00, 
    handler=0x7f58a8318dd0 <structuredSignalHandler>, handler_arg=0x180a00, flags=506, result=0x7f585f551e28) at ../../../../../../omr/port/unix/omrsignal.c:425
    #20 0x00007f58a83658ba in javaThreadProc (entryarg=0x7f58a4021d00) at ../../../../../openj9/runtime/vm/vmthread.c:346
    #21 0x00007f58a3df34f6 in thread_wrapper (arg=0x7f58a438e0c8) at ../../../../../../omr/thread/common/omrthread.c:1724
    #22 0x00007f58aa8306ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
    #23 0x00007f58aa15151d in clone () from /lib/x86_64-linux-gnu/libc.so.6

    Does this resemble the native stack in your environment?

  2. Also tried hotspot but failed w/ UnsatisfiedLinkError

    
    java -showversion OPTLoad
    openjdk version "11.0.12" 2021-07-20
    OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7)
    OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode)
    Exception in thread "main" java.lang.UnsatisfiedLinkError: no libsvml.so in java.library.path: [/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs, /usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
    at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2670)
    at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
    at java.base/java.lang.System.loadLibrary(System.java:1873)
    at OPTLoad.main(OPTLoad.java:6)

java -showversion OPTLoadInThread openjdk version "11.0.12" 2021-07-20 OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7) OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode) Library loaded, Exiting main program Loading libsvml.so java.lang.UnsatisfiedLinkError: no libsvml.so in java.library.path: [/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs, /usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib] at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2670) at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830) at java.base/java.lang.System.loadLibrary(System.java:1873) at TestThread.run(TestThread.java:10)


Are these error expected?
a0304 commented 3 years ago

@JasonFengJ9, Yes, i did see a similar native stack track. And yes the version that i earlier supplied does fail in my Hotspot set up too. That is because I had missed including the libintlc.so.5 file in the test libs dir (apologies for the mess up). This is a necessary reference file and this file too is also delivered by the fortran compiler. Please ignore the old dir that i had shared.

I have shared a new box dir (Stand Alone Crash Test V1.0) with the right set of refs, test classes and the stack trace and dumps i see when running OptLoadInThread.class in an OpenJ9 vm. Please use this. I can now see the libsvml.so loading successfully in a Hotspot jvm and crashing in openj9. Please do let me know if you have any questions.

The native stack i see with this version is

`Unhandled exception Type=Segmentation error vmState=0x00000000 J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002 Handler1=00007FCA18B400A0 Handler2=00007FCA1842FA60 InaccessibleAddress=00007FC9DC6E8EE8 RDI=00007FC9DC6B1EF0 RSI=0000000001795E60 RAX=00000000000720F0 RBX=00007FC9DC723FD8 RCX=0000000000000008 RDX=00007FC9DC6B1EF0 R8=00007FC9DC7241E0 R9=00007FCA1ADC8C60 R10=00007FC9DC723DA0 R11=0000001400000004 R12=00000000000720E0 R13=00007FC9DC7241E0 R14=00007FC9DC724020 R15=00007FC9DC724100 RIP=00007FCA1ADAF357 GS=0000 FS=0000 RSP=00007FC9DC6E7EF0 EFlags=0000000000010206 CS=0033 RBP=00007FC9DC723F40 ERR=0000000000000006 TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=00007FC9DC6E8EE8 xmm0 657268544f537875 (f: 1330870400.000000, d: 4.773897e+180) xmm1 62696c2f7362696c (f: 1935829376.000000, d: 1.171191e+166) xmm2 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm3 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm4 6f74636572696420 (f: 1919509504.000000, d: 7.727821e+228) xmm5 726f746365726964 (f: 1701996928.000000, d: 1.677920e+243) xmm6 6c2f65726a746e65 (f: 1786015360.000000, d: 1.321189e+213) xmm7 6665726465737365 (f: 1702065024.000000, d: 1.822597e+185) xmm8 dddddddddddddddd (f: 3722305024.000000, d: -1.456816e+144) xmm9 422f78696e552f6c (f: 1851076480.000000, d: 6.758208e+10) xmm10 0000000000000000 (f: 0.000000, d: 0.000000e+00) xmm11 0000ff0000000000 (f: 0.000000, d: 1.385239e-309) xmm12 000000004689a022 (f: 1183424512.000000, d: 5.846894e-315) xmm13 0000000047ac082f (f: 1202456576.000000, d: 5.940925e-315) xmm14 0000000048650dc0 (f: 1214582272.000000, d: 6.000833e-315) xmm15 0000000046b73e38 (f: 1186414080.000000, d: 5.861665e-315) Module=/lib64/ld-linux-x86-64.so.2 Module_base_address=00007FCA1ADAA000 Target=2_90_20200715_697 (Linux 3.10.0-957.el7.x86_64) CPU=amd64 (12 logical CPUs) (0xfb8316000 RAM) ----------- Stack Backtrace ----------- (0x00007FCA1ADAF357 [ld-linux-x86-64.so.2+0x5357]) (0x00007FCA1ADB250C [ld-linux-x86-64.so.2+0x850c]) (0x00007FCA1ADBE1E4 [ld-linux-x86-64.so.2+0x141e4]) (0x00007FCA1ADB9714 [ld-linux-x86-64.so.2+0xf714]) (0x00007FCA1ADBDACB [ld-linux-x86-64.so.2+0x13acb]) (0x00007FCA1A563EEB [libdl.so.2+0xeeb]) (0x00007FCA1ADB9714 [ld-linux-x86-64.so.2+0xf714]) (0x00007FCA1A5644ED [libdl.so.2+0x14ed]) dlopen+0x31 (0x00007FCA1A563F81 [libdl.so.2+0xf81]) (0x00007FCA18432958 [libj9prt29.so+0x1d958]) (0x00007FCA18B73A07 [libj9vm29.so+0xc7a07]) (0x00007FCA18B73FFD [libj9vm29.so+0xc7ffd]) (0x00007FCA18B74239 [libj9vm29.so+0xc8239]) (0x00007FCA18AC49C8 [libj9vm29.so+0x189c8]) (0x00007FCA18ABEB60 [libj9vm29.so+0x12b60]) (0x00007FCA18B7BC52 [libj9vm29.so+0xcfc52])

JVMDUMP039I Processing dump event "gpf", detail "" at 2021/09/02 12:03:10 - please wait. JVMDUMP032I JVM requested System dump using '/u/users/g6h/LinuxSOThreadTest/core.20210902.120310.45760.0001.dmp' in response to an event JVMDUMP032I JVM requested Java dump using '/u/users/g6h/LinuxSOThreadTest/javacore.20210902.120310.45760.0002.txt' in response to an event JVMDUMP010I Java dump written to /u/users/g6h/LinuxSOThreadTest/javacore.20210902.120310.45760.0002.txt JVMDUMP032I JVM requested Snap dump using '/u/users/g6h/LinuxSOThreadTest/Snap.20210902.120310.45760.0003.trc' in response to an event JVMDUMP010I Snap dump written to /u/users/g6h/LinuxSOThreadTest/Snap.20210902.120310.45760.0003.trc JVMDUMP032I JVM requested JIT dump using '/u/users/g6h/LinuxSOThreadTest/jitdump.20210902.120310.45760.0004.dmp' in response to an event JVMDUMP010I JIT dump written to /u/users/g6h/LinuxSOThreadTest/jitdump.20210902.120310.45760.0004.dmp JVMDUMP013I Processed dump event "gpf", detail "". JVMDUMP012E Error in System dump: The core file created by child process with pid = 45790 was not found. Expected to find core file with name "/u/users/g6h/LinuxSOThreadTest/core.45790" JVMDUMP030W Cannot write dump to file /u/users/g6h/LinuxSOThreadTest/javacore.20210902.120310.45760.0002.txt: File exists JVMDUMP032I JVM requested Java dump using '/scratch/javacore.20210902.120310.45760.0002.txt' in response to an event Memory fault(coredump)`

a0304 commented 3 years ago

Hi @JasonFengJ9, Were you able to replicate the crash with the new set of libs and code that I added to the box dir with my last comment ? Do we know why this crash happens ? Thanks.

JasonFengJ9 commented 3 years ago

Yeah, I am able to reproduce the segmentation error w/ the testcases supplied, the investigation is in progress.

On the other hand, JDK17 [1][2] works in both testcases, can you verify if it works in your application as well? Note: JDK17+ adopted a different native library loading approach than JDK 8/11.

[1] https://github.com/ibmruntimes/semeru17-binaries/releases/download/jdk-17%2B35_openj9-0.28.0-m1/ibm-semeru-open-jdk_x64_linux_17_35_openj9-0.28.0-m1.tar.gz [2] https://github.com/ibmruntimes/semeru17-binaries/releases/download/jdk-17%2B35_openj9-0.28.0-m2/ibm-semeru-open-jdk_x64_linux_17_35_openj9-0.28.0-m2.tar.gz

JasonFengJ9 commented 3 years ago

Summary of investigation so far:

Testcase passed:

System.loadLibrary("svml");

within main() method.

Testcase failed

Runnable run = () -> System.loadLibrary("svml");
Thread t = new Thread(run);
t.start();

The segmentation error occurred at dl-load.c:1970(open_verify) __lseek (fd, ph->p_offset, SEEK_SET);

(gdb) l
1965            if (ph->p_offset + size <= (size_t) fbp->len)
1966              abi_note = (void *) (fbp->buf + ph->p_offset);
1967            else
1968              {
1969            abi_note = alloca (size);
1970            __lseek (fd, ph->p_offset, SEEK_SET);
1971            if (__libc_read (fd, (void *) abi_note, size) != size)
1972              goto read_error;
1973              }
1974    
(gdb) p fd
$5 = 4
(gdb) p ph
$6 = (Elf64_Phdr *) 0x7fffd0055c68
(gdb) p/x ph->p_offset
$7 = 0x1795e60
(gdb) bt
#0  open_verify (name=0x7fda40053070 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/libsvml.so", fbp=fbp@entry=0x7fda8e909b40, loader=<optimized out>, whatcode=whatcode@entry=0, 
    mode=mode@entry=-1879048191, found_other_class=found_other_class@entry=0x7fda8e909b2f, free_name=true, fd=4) at dl-load.c:1970
#1  0x00007fdaba62fc05 in _dl_map_object (loader=loader@entry=0x7fdab4003dd0, name=name@entry=0x7fda8e90a7e0 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/libsvml.so", type=type@entry=2, 
    trace_mode=trace_mode@entry=0, mode=mode@entry=-1879048191, nsid=<optimized out>) at dl-load.c:2436
#2  0x00007fdaba63c5b7 in dl_open_worker (a=a@entry=0x7fda8e90a100) at dl-open.c:237
#3  0x00007fdaba6375a4 in _dl_catch_error (objname=objname@entry=0x7fda8e90a0f0, errstring=errstring@entry=0x7fda8e90a0f8, mallocedp=mallocedp@entry=0x7fda8e90a0ef, 
    operate=operate@entry=0x7fdaba63c510 <dl_open_worker>, args=args@entry=0x7fda8e90a100) at dl-error.c:187
#4  0x00007fdaba63bde9 in _dl_open (file=0x7fda8e90a7e0 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/libsvml.so", mode=-2147483647, 
    caller_dlopen=0x7fdab3a9cd3b <omrsl_open_shared_library+780>, nsid=-2, argc=<optimized out>, argv=<optimized out>, env=0x7fff5fff1780) at dl-open.c:660
#5  0x00007fdab9c2ef09 in dlopen_doit (a=a@entry=0x7fda8e90a330) at dlopen.c:66
#6  0x00007fdaba6375a4 in _dl_catch_error (objname=0x7fda400008d0, errstring=0x7fda400008d8, mallocedp=0x7fda400008c8, operate=0x7fdab9c2eeb0 <dlopen_doit>, args=0x7fda8e90a330) at dl-error.c:187
#7  0x00007fdab9c2f571 in _dlerror_run (operate=operate@entry=0x7fdab9c2eeb0 <dlopen_doit>, args=args@entry=0x7fda8e90a330) at dlerror.c:163
#8  0x00007fdab9c2efa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#9  0x00007fdab3a9cd3b in omrsl_open_shared_library (portLibrary=0x7fdab8ff2e80 <j9portLibrary>, name=0x7fda8e90ae50 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/svml", 
    descriptor=0x7fda400081e8, flags=7) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/omr/port/unix/omrsl.c:163
#10 0x00007fdab86ca4ac in classLoaderRegisterLibrary (voidVMThread=0x180a00, classLoader=0x7fdab40a3908, logicalName=0x7fda400091f0 "svml", 
    physicalName=0x7fda8e90ae50 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/svml", libraryPtr=0x0, 
    errBuf=0x7fda40052a10 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/svml: cannot open shared object file: No such file or directory", bufLen=512, flags=7) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmbootlib.c:720
#11 0x00007fdab86c9285 in openNativeLibrary (vm=0x7fdab400d9c0, classLoader=0x7fdab40a3908, libName=0x7fda400091f0 "svml", 
    libraryPath=0x7fda40052974 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs:/usr/lib64:/usr/lib", libraryPtr=0x0, openFunction=0x7fdab86c98f5 <classLoaderRegisterLibrary>, userData=0x180a00, 
    errorBuffer=0x7fda40052a10 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/svml: cannot open shared object file: No such file or directory", bufferLength=512) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmbootlib.c:297
#12 0x00007fdab86c951d in registerNativeLibrary (vmThread=0x180a00, classLoader=0x7fdab40a3908, libName=0x7fda400091f0 "svml", 
    libraryPath=0x7fda40052880 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default:/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-"..., libraryPtr=0x0, 
    errorBuffer=0x7fda40052a10 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/svml: cannot open shared object file: No such file or directory", bufferLength=512) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmbootlib.c:371
#13 0x00007fdab86f1053 in VM_BytecodeInterpreterCompressed::inlClassLoaderLoadLibraryWithPath (this=0x7fda8e90b970, _sp=@0x7fda8e90b8e0: 0x1b1298, 
    _pc=@0x7fda8e90b8e8: 0x7 <error: Cannot access memory at address 0x7>) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/BytecodeInterpreter.hpp:4611
#14 0x00007fdab8704bd0 in VM_BytecodeInterpreterCompressed::run (this=0x7fda8e90b970, vmThread=0x180a00)
    at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/BytecodeInterpreter.hpp:9993
#15 0x00007fdab86db228 in bytecodeLoopCompressed (currentThread=0x180a00) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/BytecodeInterpreter.inc:112
#16 0x00007fdab8825842 in c_cInterpreter () at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/vm/runtime/vm/xcinterp.s:158
#17 0x00007fdab8607c16 in runJavaThread (currentThread=0x180a00) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/callin.cpp:648
#18 0x00007fdab86d9071 in javaProtectedThreadProc (portLibrary=0x7fdab8ff2e80 <j9portLibrary>, entryarg=0x180a00)
    at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmthread.c:2088
#19 0x00007fdab3a96f8b in omrsig_protect (portLibrary=0x7fdab8ff2e80 <j9portLibrary>, fn=0x7fdab86d8f31 <javaProtectedThreadProc>, fn_arg=0x180a00, handler=0x7fdab8646712 <structuredSignalHandler>, 
    handler_arg=0x180a00, flags=506, result=0x7fda8e90be10) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/omr/port/unix/omrsignal.c:425
#20 0x00007fdab86d4682 in javaThreadProc (entryarg=0x7fdab400d9c0) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmthread.c:346
#21 0x00007fdab83d4430 in thread_wrapper (arg=0x7fdab438e678) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/omr/thread/common/omrthread.c:1724
#22 0x00007fdab9a186ba in start_thread (arg=0x7fda8e90e700) at pthread_create.c:333
#23 0x00007fdaba15351d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Set a breakpoint at dl-load.c:1970, the passing case has following stacktrace:

(gdb) p fd
$1 = 4
(gdb) p ph
$2 = (Elf64_Phdr *) 0x7ffff6fa3ae8
(gdb) p/x ph->p_offset
$3 = 0x1795e60
(gdb) bt
#0  open_verify (name=0x7ffff02f3310 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/libsvml.so", fbp=fbp@entry=0x7ffff6fa39c0, loader=<optimized out>, whatcode=whatcode@entry=0, 
    mode=mode@entry=-1879048191, found_other_class=found_other_class@entry=0x7ffff6fa39af, free_name=true, fd=4) at dl-load.c:1970
#1  0x00007ffff7ddfc05 in _dl_map_object (loader=loader@entry=0x7ffff0003dd0, name=name@entry=0x7ffff6fa4660 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/libsvml.so", type=type@entry=2, 
    trace_mode=trace_mode@entry=0, mode=mode@entry=-1879048191, nsid=<optimized out>) at dl-load.c:2436
#2  0x00007ffff7dec5b7 in dl_open_worker (a=a@entry=0x7ffff6fa3f80) at dl-open.c:237
#3  0x00007ffff7de75a4 in _dl_catch_error (objname=objname@entry=0x7ffff6fa3f70, errstring=errstring@entry=0x7ffff6fa3f78, mallocedp=mallocedp@entry=0x7ffff6fa3f6f, 
    operate=operate@entry=0x7ffff7dec510 <dl_open_worker>, args=args@entry=0x7ffff6fa3f80) at dl-error.c:187
#4  0x00007ffff7debde9 in _dl_open (file=0x7ffff6fa4660 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/libsvml.so", mode=-2147483647, 
    caller_dlopen=0x7ffff5413d3b <omrsl_open_shared_library+780>, nsid=-2, argc=<optimized out>, argv=<optimized out>, env=0x7fffffffe5e0) at dl-open.c:660
#5  0x00007ffff73def09 in dlopen_doit (a=a@entry=0x7ffff6fa41b0) at dlopen.c:66
#6  0x00007ffff7de75a4 in _dl_catch_error (objname=0x7ffff00008f0, errstring=0x7ffff00008f8, mallocedp=0x7ffff00008e8, operate=0x7ffff73deeb0 <dlopen_doit>, args=0x7ffff6fa41b0) at dl-error.c:187
#7  0x00007ffff73df571 in _dlerror_run (operate=operate@entry=0x7ffff73deeb0 <dlopen_doit>, args=args@entry=0x7ffff6fa41b0) at dlerror.c:163
#8  0x00007ffff73defa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#9  0x00007ffff5413d3b in omrsl_open_shared_library (portLibrary=0x7ffff67a2e80 <j9portLibrary>, name=0x7ffff6fa4cd0 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/svml", 
    descriptor=0x7ffff03f6e18, flags=7) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/omr/port/unix/omrsl.c:163
#10 0x00007ffff5e7a4ac in classLoaderRegisterLibrary (voidVMThread=0x18d00, classLoader=0x7ffff00a38f8, logicalName=0x7ffff02d9970 "svml", 
    physicalName=0x7ffff6fa4cd0 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs/svml", libraryPtr=0x0, 
    errBuf=0x7ffff023ffd0 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/svml: cannot open shared object file: No such file or directory", bufLen=512, flags=7) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmbootlib.c:720
#11 0x00007ffff5e79285 in openNativeLibrary (vm=0x7ffff000d9c0, classLoader=0x7ffff00a38f8, libName=0x7ffff02d9970 "svml", 
    libraryPath=0x7ffff023ff34 "/team/git-issues/git13269/testcase/LinuxSOThreadTest/libs:/usr/lib64:/usr/lib", libraryPtr=0x0, openFunction=0x7ffff5e798f5 <classLoaderRegisterLibrary>, userData=0x18d00, 
    errorBuffer=0x7ffff023ffd0 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/svml: cannot open shared object file: No such file or directory", bufferLength=512) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmbootlib.c:297
#12 0x00007ffff5e7951d in registerNativeLibrary (vmThread=0x18d00, classLoader=0x7ffff00a38f8, libName=0x7ffff02d9970 "svml", 
    libraryPath=0x7ffff023fe40 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default:/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-"..., libraryPtr=0x0, 
    errorBuffer=0x7ffff023ffd0 "/team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/svml: cannot open shared object file: No such file or directory", bufferLength=512) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/vmbootlib.c:371
#13 0x00007ffff5ea1053 in VM_BytecodeInterpreterCompressed::inlClassLoaderLoadLibraryWithPath (this=0x7ffff6fa57f0, _sp=@0x7ffff6fa5760: 0x10f120, 
    _pc=@0x7ffff6fa5768: 0x7 <error: Cannot access memory at address 0x7>) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/BytecodeInterpreter.hpp:4611
#14 0x00007ffff5eb4bd0 in VM_BytecodeInterpreterCompressed::run (this=0x7ffff6fa57f0, vmThread=0x18d00)
    at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/BytecodeInterpreter.hpp:9993
#15 0x00007ffff5e8b228 in bytecodeLoopCompressed (currentThread=0x18d00) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/BytecodeInterpreter.inc:112
#16 0x00007ffff5fd5842 in c_cInterpreter () at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/vm/runtime/vm/xcinterp.s:158
#17 0x00007ffff5db9f26 in runCallInMethod (env=0x18d00, receiver=0x0, clazz=0x10f260, methodID=0x7ffff03f6718, args=0x7ffff6fa5dd8)
    at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/callin.cpp:1123
#18 0x00007ffff5e0161a in gpProtectedRunCallInMethod (entryArg=0x7ffff6fa5d70) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/jnicsup.cpp:300
#19 0x00007ffff5fee929 in signalProtectAndRunGlue (portLibrary=0x7ffff67a2e80 <j9portLibrary>, userData=0x7ffff6fa5d20)
    at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/util/jniprotect.c:45
#20 0x00007ffff540df8b in omrsig_protect (portLibrary=0x7ffff67a2e80 <j9portLibrary>, fn=0x7ffff5fee8fd <signalProtectAndRunGlue>, fn_arg=0x7ffff6fa5d20, handler=0x7ffff5df6712 <structuredSignalHandler>, 
    handler_arg=0x18d00, flags=506, result=0x7ffff6fa5d08) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/omr/port/unix/omrsignal.c:425
#21 0x00007ffff5feea87 in gpProtectAndRun (function=0x7ffff5e015bd <gpProtectedRunCallInMethod(void*)>, env=0x18d00, args=0x7ffff6fa5d70)
    at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/util/jniprotect.c:78
#22 0x00007ffff5e01b88 in gpCheckCallin (env=0x18d00, receiver=0x0, cls=0x10f260, methodID=0x7ffff03f6718, args=0x7ffff6fa5dd8)
    at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/jnicsup.cpp:488
#23 0x00007ffff5e00404 in callStaticVoidMethod (env=0x18d00, cls=0x10f260, methodID=0x7ffff03f6718) at /team/xa-docker/semeru-builds/jdk11-0906/openj9-openjdk-jdk11/openj9/runtime/vm/jnicgen.c:384
#24 0x00007ffff7bcb923 in JavaMain (_args=<optimized out>) at ./src/java.base/share/native/libjli/java.c:549
#25 0x00007ffff7bcf4a9 in ThreadJavaMain (args=<optimized out>) at ./src/java.base/unix/native/libjli/java_md_solinux.c:759
#26 0x00007ffff71c86ba in start_thread (arg=0x7ffff6fa6700) at pthread_create.c:333
#27 0x00007ffff790351d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

It is not clear why segmentation fault occurred at __lseek (fd, ph->p_offset, SEEK_SET) w/ similar variable values between passing and failing cases. The stacktrace diff is from frame 17 and up which represents two ways launching c_cInterpreter().

a0304 commented 3 years ago

Thanks @JasonFengJ9. I will test with the new JDK17 and let you know the results.

JasonFengJ9 commented 3 years ago

Update:

The testcase caused JDK11 [1] segmentation error at Ubuntu 16.04.7, but no error at Ubuntu 18.04.5 and Ubuntu 20.04.2. Also tried RHEL Server release 7.8 (Maipo) and it works as well, but didn't find RHEL Server release 7.6 (Maipo) reported initially. It appears a system issue fixed in later releases.

@a0304 can you check if your application works in the OS versions specified above?

[1] https://github.com/AdoptOpenJDK/semeru11-binaries/releases/download/jdk-11.0.12%2B7_openj9-0.27.0/ibm-semeru-open-jdk_x64_linux_11.0.12_7_openj9-0.27.0.tar.gz