adoptium / adoptium-support

For end-user problems reported with our binary distributions
Apache License 2.0
45 stars 15 forks source link

HotSpot-based Java 11 and higher VM crashes when loaded and initialized via JNI Invocation API on AIX #997

Open twaldrep opened 8 months ago

twaldrep commented 8 months ago

Please provide a brief summary of the bug

All HotSpot-based distributions that we've tested since 11.0.12 (might have happened earlier) crash very early in the JVM initialization on AIX. The same method that we've been using since around 2003 for loading the JVM still works without fail on our other supported platforms (WIndows and Red Hat-compatible Linux distributions). This includes JDK 21 distributions. As a result of this, we're currently having to embed IBM's OpenJ9-based Semeru JRE in our product distribution, but we recommend the HotSpot-based Adoptium builds to our customers who embed our product in their Java applications (which don't need the JNI Invocation API).

As inferred above, the OpenJ9-based distributions through JDK 21 (highest version that we've tested) load and initialize via JNI Invocation Interface without error on all of our supported platforms, including AIX.

Did you test with the latest update version?

We have tested with the latest available Adoptium JDK 11, 17 and 21 builds for AIX.  We've also tested with the latest available SapMachine 21 build for AIX.

Please provide steps to reproduce where possible

JvmLoader.tar.gz

Attached is a tar.gz which contains a small sample program which illustrates the segmentation fault crash in the JVM. To run it, follow these steps:

  1. Extract into a folder on an AIX 7.2 machine with sufficient IBM XL C++ runtime
  2. Open the file named JvmManager.h, and either add or uncomment one of the lines that initialize a static string variable named JVM_FILE. Replace the value with a path to a JDK 11 or higher version libjvm.so file.
  3. Use the included file named "build" to build the small JvmLoader application
  4. Run JvmLoader something like the following: LIBPATH=/jdkpath/lib/server:/jdkpath/lib JvmLoader
  5. For us, this produces a segmentation fault in all HotSpot JVM versions 11 and higher. This works without fail on all IBM Semeru (which uses OpenJ9 JVM) versions 11 and higher and all Java versions that we've tested (through JDK 21) on Windows and Linux. Note that we didn't include the platform-specific code for Windows and Linux which loads the jvm shared library

Expected Results

After explicitly loading libjvm.so, the JVM loads successfully when JNI_CreateJavaVM is called.

Actual Results

After explicitly loading libjvm.so, the JVM crashes with a segmentation fault while calling JNI_CreateJavaVM. The dbx utility reports the following stack trace from our test application:

IPRA.$checked_mprotectFPcUli(??, ??, ??) at 0x90000003024778c guard_memory__2osFPcUl(??, ??) at 0x900000030241978 create_stack_guard_pages10JavaThreadFv(??) at 0x9000000302fd9bc create_vm7ThreadsFP14JavaVMInitArgsPb(??, ??) at 0x900000030302470 JNI_CreateJavaVM_innerFPP7JavaVM_PPvPv(??, ??, ??) at 0x900000030a2f92c JvmManager::initializeJvm()(), line 2179 in "memory" JvmLoader.JvmManager::JvmManager()::'lambda'()::operator()() const(this = 0x000000011004a4e0), line 39 in "JvmManager.h" unnamed block in _ZNSt3117call_once_proxyINS_5tupleIJOZN10JvmManagerC1EvEUlvE_EEEEEvPv(vp = 0x000000011004a4f8), line 2220 in "type_traits" unnamed block in _ZNSt3117call_once_proxyINS_5tupleIJOZN10JvmManagerC1EvEUlvE_EEEEEvPv(vp = 0x000000011004a4f8), line 2220 in "type_traits" _ZNSt3117call_once_proxyINS_5tupleIJOZN10JvmManagerC1EvEUlvE_EEEEEvPv(vp = 0x000000011004a4f8), line 2220 in "type_traits" std::1::call_once(unsigned long volatile&, void, void ()(void))(??, ??, ??) at 0x9000000035c37c8 unnamed block in JvmLoader.JvmManager::JvmManager()(this = 0x000000011004a5d0), line 666 in "mutex" unnamed block in JvmLoader.JvmManager::JvmManager()(this = 0x000000011004a5d0), line 666 in "mutex" JvmLoader.JvmManager::JvmManager()(this = 0x000000011004a5d0), line 666 in "mutex" main::$_0::operator()() const(this = 0x0000000110016730), line 7 in "JvmLoader.cpp" unnamed block in void std::1::thread_proxy<std::1::tuple<std::1::unique_ptr<std::1::thread_struct, std::1::default_delete >, main::$_0> >(void)(__vp = 0x0000000110016730), line 2227 in "type_traits" unnamed block in void std::1::thread_proxy<std::1::tuple<std::1::unique_ptr<std::1::thread_struct, std::1::default_delete >, main::$_0> >(void)(__vp = 0x0000000110016730), line 2227 in "type_traits" void std::1::thread_proxy<std::1::tuple<std::1::unique_ptr<std::1::thread_struct, std::__1::default_delete<std::1::thread_struct> >, main::$_0> >(void*)(vp = 0x0000000110016730), line 2227 in "type_traits"

What Java Version are you using?

openjdk version "11.0.19" 2023-04-18 OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7) OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode)

What is your operating system and platform?

AIX 7.2 with IBM XL C++ runtime 16.1.0.10 (note that we experience the same crash with many versions of the IBM XL C++ runtime, including Open XL C++ 17.1.x).

How did you install Java?

Most tests are on JDK/JRE distributions expanded from a tar.gz archive.

Did it work before?

Yes, this approach to loading and initializing the JVM using the JNI Invocation API has worked on all of our supported platforms (Windows, Red Hat-compatible Linux distributions, and AIX) for 20 years.  The crash only started happening on AIX after we the version of Java that we embed without application from Java 8 to Java 11.  Our other supported platforms continue to work without fail using embedded JRE 11 and higher distributions.

Did you test with other Java versions?

openjdk version "11.0.12" 2021-07-20
OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7)
OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode)

openjdk version "11.0.19" 2023-04-18
OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7)
OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode)

openjdk version "17.0.8.1" 2023-08-24
OpenJDK Runtime Environment Temurin-17.0.8.1+1 (build 17.0.8.1+1)
OpenJDK 64-Bit Server VM Temurin-17.0.8.1+1 (build 17.0.8.1+1, mixed mode)

// The following requires Open XL C++ runtime 17.1.x.  Included to show that we experience
// the same crash with a JDK built with IBM Open XL C++ 17.1 (and our application also built
// with the same).
openjdk version "21.0.2-ea" 2024-01-16
OpenJDK Runtime Environment SapMachine (build 21.0.2-ea+2)
OpenJDK 64-Bit Server VM SapMachine (build 21.0.2-ea+2, mixed mode)

We've tested with other JDK 11+ HotSpot builds as well.  All crash with a segmentation fault on AIX.

Relevant log output

No log output.  Segmentation fault with core dump only.
TheRealMDoerr commented 8 months ago

hotspot creates guard pages for each Java Thread. This causes unfortunate limitations, especially on AIX. I believe it doesn't work for the primordial thread and it requires a certain thread stack size. (Only some of the reasons why I don't like this design. I hope that we can remove it at some point of time.) We recently had a similar problem here: https://github.com/openjdk/jdk/blob/2003610b3b52eed04de6713a2a36151d0d86d7c9/test/lib/native/testlib_threads.h#L83 Attaching to the JVM works for a new pthread with large enough stack size.

twaldrep commented 8 months ago

@TheRealMDoerr We are aware of the primordial thread issue. The HotSpot JVM on AIX produces an error message stating this if an attempt is made to create a JVM instance on the primordial thread. The sample code that we attached creates a separate thread on which it attempts to create the JVM instance.

Since we use C++ std::thread instead of pthreads, we can't directly set the stack size. However, based on your response and the test code that you referenced, we replaced the std::thread used to initialize the JVM with pthread configured with a large stack size. This test loaded the HotSpot JVM successfully. So, this does give us a work-around for this problem.

So, my obvious next questions follow:

The HotSpot JVM implementation which uses guard pages for each thread results in inconsistent JNI Invocation API behavior when compared with JREs which use OpenJ9 JVM. The OpenJ9 JVM can be loaded in the primordial thread AND does not explicitly require a very large thread stack size. This means that we can initialize it directly in the primary thread used to invoke main. Also, if we do initialize the OpenJ9-based JVM in a separate thread, we can used C++ std::thread. Additionally, it causes inconsistent JNI Invocation API behavior when compared with the HotSpot JVM on other platforms (like Linux and Windows). We do NOT have to use a separate thread to initialize the HotSpot JVM on those platforms.

Is it possible to rethink the HotSpot JVM design decision which led to all of these issues on AIX? It's a little late for us since we now know the source of the problem, but it might help the next organization.

TheRealMDoerr commented 8 months ago

I have filed a JBS issue: https://bugs.openjdk.org/browse/JDK-8324431 Let me know if you have further input. The page size may also play a role. Using -XX:-Use64KPages could make a difference, but I don't want to recommend that for production use.

twaldrep commented 7 months ago

@TheRealMDoerr Based on your feedback, we have replaced the top-level thread that we use to load the JVM on AIX (only) with a pthread. The top-level thread on AIX was previously a C++11 std::thread which has no API to set the thread stack size. This has gotten us past the crash-on-load issue with the HotSpot JVM. We continue to use std::thread for all other threads that we need to create in our application. Unfortunately, I think work-around would be completely unacceptable to many companies. I know that we didn't like making the exception.

Thanks for submitting JDK-8324431. I've read through the comments. Maybe I'm taking this out of context, but I completely disagree with the following statement by David Holmes:

"If someone reports "My Java application won't run on a C++ Thread because C++ makes the stack too small" then that is not a Java problem." for the following reasons.

Our application uses C++ std::thread across all supported platforms, which doesn't have an API to set the stack size. We transitioned to std::thread around 10 years ago after C++11-compliant compilers were readily available on all of our supported platforms. We continued to dynamically load the JVM (both HotSpot and OpenJ9) via JNI Invocation Interface reliably with Java 7 and Java 8 on all platforms, including AIX. After transitioning to Java 11 a couple of years ago, there were suddenly issues loading the HotSpot JVM on AIX that didn't exist previously. We suddenly couldn't load the HotSpot JVM on the primordial thread (our application is primarily single-threaded, but we have several cases where multiple threads are needed) ONLY on AIX. We were able (and still are) to load the HotSpot JVM on the primordial thread on our other platforms. We are able to load the OpenJ9 JVM on the primordial thread on all of our platforms, including AIX. The saving grace with the HotSpot JVM non-primordial thread issue on AIX is that at least it provides a useful error when it fails.

Unfortunately, the second issue with the HotSpot JVM 11+ was a complete mystery to us since all that it does it throw a SIGSEGV with no useful information other than approximately where in the JVM that it occurs when our application would attempt to load it via the JNI Invocation API. We spent a considerable amount of time attempting to debug our application, changing compiler options, etc. trying to figure out why the JVM kept crashing. At the end of the day, we ended up resorting to embedding IBM Semeru JRE distribution (OpenJ9-based) on AIX instead of the HotSpot distribution. Unfortunately, IBM's Semeru distribution's java CLI crashes when the IBM XL C++ runtime is higher than a certain patch level, which is unacceptable to our customers who load our application via their own Java application (thus not needing the JNI Invocation API since the JVM is already loaded). We've had to tell those customers to download the Adoptium HotSpot distribution for use with their application.

So... in a nutshell, David Holmes statement that this isn't a "java" issue may be right, but based on many months of pulling my hair out on AIX, I would argue that it is DEFINITELY a "JVM" issue.

TheRealMDoerr commented 7 months ago

Is JNI officially compatible with C++? I'd always go through a C layer. There may be more problems when combining JNI and C++. Nevertheless, I'm not happy with hotspot using guard pages on AIX/linux, either. I hope that we can disable them in the future, but that will require more work. So, don't expect this to change soon.

github-actions[bot] commented 4 months ago

We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. It will be closed soon unless the stale label is removed by a committer, or a new comment is made.

github-actions[bot] commented 1 month ago

We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable. It will be closed soon unless the stale label is removed by a committer, or a new comment is made.