Closed mbaudier closed 1 year ago
I have also tested with a downloaded IBM Semeru runtime:
openjdk version "17.0.8" 2023-07-18
IBM Semeru Runtime Open Edition 17.0.8.0 (build 17.0.8+7)
Eclipse OpenJ9 VM 17.0.8.0 (build openj9-0.40.0, JRE 17 Linux amd64-64-Bit Compressed References 20230718_539 (JIT enabled, AOT enabled)
OpenJ9 - d12d10c9e
OMR - e80bff83b
JCL - 77b0f754805 based on jdk-17.0.8+7)
and there is exactly the same problem when running in the systemd container.
~There is a guard page being created with mprotect being called to set the memory to PROT_NONE i.e. no access. The mprotect call is failing, which is causing the assertion.~
~https://github.com/eclipse-openj9/openj9/blob/master/runtime/vm/FlushProcessWriteBuffers.cpp#L87-L88~
The call to mlock is failing.
https://github.com/eclipse-openj9/openj9/blob/master/runtime/vm/FlushProcessWriteBuffers.cpp#L84-L85
You could add something to the code to print the errno so we could see why it is failing, but I'm guessing it's not supported in the systemd container.
The code calling mprotect is guarded by a feature flag, you can try turning off the feature and see if that resolves the problem or if it exposes another problem. @gacholio can this feature be changed by itself or do other features also need to be changed at the same time?
On the following line change ON
to OFF
.
https://github.com/eclipse-openj9/openj9/blob/master/runtime/cmake/caches/linux_x86-64.cmake#L30
To fully restore the old behaviour, turn off J9VM_INTERP_ATOMIC_FREE_JNI
and J9VM_INTERP_TWO_PASS_EXCLUSIVE
as well. I don't know that atomic-free works without the flush.
Actually I was looking at the wrong lines, it's mlock which is failing. The rest still applies.
https://github.com/eclipse-openj9/openj9/blob/master/runtime/vm/FlushProcessWriteBuffers.cpp#L84-L85
amac uses these options
set(J9VM_INTERP_ATOMIC_FREE_JNI ON CACHE BOOL "") set(J9VM_INTERP_ATOMIC_FREE_JNI_USES_FLUSH OFF CACHE BOOL "") set(J9VM_INTERP_TWO_PASS_EXCLUSIVE OFF CACHE BOOL "")
Thanks for the quick feedback!
I confirm that by setting all three options to OFF in runtime/cmake/caches/linux_x86-64.cmake:
set(J9VM_INTERP_ATOMIC_FREE_JNI OFF CACHE BOOL "")
set(J9VM_INTERP_ATOMIC_FREE_JNI_USES_FLUSH OFF CACHE BOOL "")
set(J9VM_INTERP_TWO_PASS_EXCLUSIVE OFF CACHE BOOL "")
java -version
is now working in the systemd container:
openjdk version "17.0.8-argeo" 2023-08-22
OpenJDK Runtime Environment (build 17.0.8-argeo+0)
Eclipse OpenJ9 VM (build openj9-0.40.0, JRE 17 Linux amd64-64-Bit Compressed References 20230822_000000 (JIT enabled, AOT enabled)
OpenJ9 - d12d10c9e
OMR - e80bff83b
JCL - 77b0f754805 based on jdk-17.0.8+7)
Not clear to me what 'amac' means, but I have also tested with J9VM_INTERP_ATOMIC_FREE_JNI ON and it is working as well:
set(J9VM_INTERP_ATOMIC_FREE_JNI ON CACHE BOOL "")
set(J9VM_INTERP_ATOMIC_FREE_JNI_USES_FLUSH OFF CACHE BOOL "")
set(J9VM_INTERP_TWO_PASS_EXCLUSIVE OFF CACHE BOOL "")
Also, I have added a check of the return code in FlushProcessWriteBuffers.cpp:
int mlockrc = mlock(addr, pageSize);
if(0 != mlockrc){
cout << "mlock return code: " << mlockrc << "\n";
}
Assert_VM_true(0 == mlockrc);
and tested with the failing configuration (all options ON). It prints: mlock return code: -1
So far I have only tested java -version
. I could test further with our application in the systemd container using one of the working configurations.
I guess I should then rather try the configuration with J9VM_INTERP_ATOMIC_FREE_JNI ON and the two others OFF?
Is it expected that such changes would have a big impact?
Since we are packaging the JVM as .deb, I could easily patch the build, but running in systemd container is not our production use case. The idea was rather to use it for testing and development.
amac means AArch64 Mac, or the Mac's being sold and used today. Meaning that we have one standard platform that already uses that configuration.
You could print the errno
variable when the return code is -1, or better call perror("some text");
. If it doesn't compile then add
#include <stdio.h>
#include <errno.h>
With:
int mlockrc = mlock(addr, pageSize);
if(-1 == mlockrc){
perror("FlushProcessWriteBuffers.cpp#initializeExclusiveAccess()");
std::cerr << " errno=" << errno << std::endl;
}
Assert_VM_true(0 == mlockrc);
the failing configuration prints:
FlushProcessWriteBuffers.cpp#initializeExclusiveAccess(): Operation not permitted
errno=1
So, I thought that it could have something to do with Linux "capabilities", since the point of systemd containers is to configure them easily. And if we restart the container with the CAP_IPC_LOCK capability, it works!
$ sudo systemd-nspawn --boot -D /var/lib/machines/openj9-bullseye/ --capability CAP_IPC_LOCK
In the container with the added capability, the previously failing build doe not crash:
$ /opt/built-openj9-17/bin/java -version
openjdk version "17.0.8-argeo" 2023-08-22
OpenJDK Runtime Environment (build 17.0.8-argeo+0)
Eclipse OpenJ9 VM (build openj9-0.40.0, JRE 17 Linux amd64-64-Bit Compressed References 20230822_000000 (JIT enabled, AOT enabled)
OpenJ9 - d12d10c9e
OMR - e80bff83b
JCL - 77b0f754805 based on jdk-17.0.8+7)
It means that this issue with systemd nspawn containers can be worked around / configured (cf. https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#Capability=). I will test further, and let you know if I encounter any related issues. Many thanks for your help!
I'll go ahead and close this for now but we can reopen if something else comes up.
Java -version output
(from the build environment)
Summary of problem
After building locally OpenJ9 v0.40.0 on Debian 11 (bullseye), java -version works as expected (see above). But when the binaries are copied to a systemd container (https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html), the command crashes immediately with the message:
The same problem also happens with v0.38.0 binaries that we have been using for a while without problem locally or in VMs (but not yet in systemd containers).
The container and its host are Debian 11 (bullseye). The command installing the container is:
$ sudo debootstrap --include=systemd-container bullseye /var/lib/machines/openj9-bullseye
The command launching the container is:$ sudo systemd-nspawn --boot -D /var/lib/machines/openj9-bullseye/
The configure of the build is:
Diagnostic files
The small diagnostic files are attached javacore.20230820.140012.120.0002.txt Snap.20230820.140012.120.0003.trc.gz
Should I also provide the dump?