Closed dmitripivkine closed 4 years ago
@pshipton - is there someone to look at this, would it be someone from CL team?
@andrew-m-leonard can someone take a look please. You might contact the IBM JCL team for how this problem was resolved.
@pshipton Looking into this
@pshipton I've been digging through the old mercurial commits on rt-patch and found one that sets up the memory space available to AIX systems when building the launcher. This code should be in https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/cc4272e86c12e635710cca2a4c5833c37e398c7b/src/java.base/unix/native/libjli/java_md_solinux.c#L1 but isn't and I'm wondering if this is the cause.
It uses LDR_CNTRL (https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/j9_configure_aix_ldr_cntrl.html) to decide if it needs to calculate how much memory should be allocated to it automatically or use a manually specified environment variable value as the memory size MAXDATA
. I can zip up the code snippet in a file and include it here if it makes it easier for you to understand, however it does not come with a license
I also found another potential cause at https://github.com/AdoptOpenJDK/openjdk-build/blob/0a03a3ecec069613e80150f75aedd43d2869d668/build-farm/platform-specific-configurations/aix.sh#L34 which sets MAXDATA
to 0x80000000
The doc says that LDR_CNTRL is specific to 32-bit, so I don't think that's it.
There is a 64-bit defined section within this file. This uses ulimit and a rlimit64
struct to organise the memory (https://www.gnu.org/software/libc/manual/html_node/Limits-on-Resources.html). Will start picking this apart to make sure I haven't missed anything there
64-bit appears to be normal too. Program gets the current limit and maximum limit of data (using RLIMIT_DATA
), checks if it is infinite and if not, attempts to set it to infinite. Failure of this results in a warning that a hard ulimit hasn't been set to infinite and out of memory errors may occur. Will keep digging.
I expect a link option to set the preferred base address of the code. In #7458 Julian mentioned -T
or -bpT
Maybe we could move the discussion about AIX to the AIX specific issue #7458
This issue is not unique to openj9. Hotspot also loads the java launcher at 0x80000000 on zlinux and at 0x00060000 on x86, even when the jdk is built on zlinux. My current hypothesis is that there is a ld
missing somewhere within the build scripts that loads the launcher in the right place. I'm currently looking into where exactly this missing command is (most likely a script that's called from make/launcher/LauncherCommon.gmk)
This issue is not unique to openj9. Hotspot also loads the java launcher at 0x80000000 on zlinux and at 0x00060000 on x86, even when the jdk is built on zlinux. My current hypothesis is that there is a
ld
missing somewhere within the build scripts that loads the launcher in the right place. I'm currently looking into where exactly this missing command is (most likely a script that's called from make/launcher/LauncherCommon.gmk)
I am not sure Hotspot does care about this. Hotspot Compressed Refs implementation does not rely on special usage of virtual memory below 4G bar
RTC PR 100052 is the work item where this problem was fixed for IBM Java 8. CompileLaunchers.txt is the patch details and is not present in OpenJDK's code. I'm currently trying to build and/or run the java launcher with this patch included
The work items that will fix this bug are:
105f80000-105f81000 r-xp 00000000 fd:01 12255 /root/SharedDocker/openj9-openjdk-jdk8/build/linux-s390x-normal-server-release/images/j2sdk-image/bin/java
105f81000-105f82000 r--p 00000000 fd:01 12255 /root/SharedDocker/openj9-openjdk-jdk8/build/linux-s390x-normal-server-release/images/j2sdk-image/bin/java
105f82000-105f83000 rw-p 00001000 fd:01 12255 /root/SharedDocker/openj9-openjdk-jdk8/build/linux-s390x-normal-server-release/images/j2sdk-image/bin/java
JAVAW_LDFLAGS
will not adjust the base load address for me.@pshipton Do you know what variable sets the load flags for the java launcher in JDK8/11?
The javaw
launcher (JAVAW_LDFLAGS
) is only built for Windows.
For Java 8, I see $1_LDFLAGS
used in CompileLaunchers.gmk
For Java11+, I also see $1_LDFLAGS
used in LauncherCommon.gmk
I assume this means JAVA_LDFLAGS
will affect the java
launcher, but really the change should be generic and change all the launchers, not just java
.
Hmmm. There's a lot of places within the https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/e7da16be04c9cf4e6734e3621a5f40e34001de8a/make/launcher/Launcher-java.base.gmk#L1 you could plug this in. I'll trial some locations within the makefile, starting with LDFLAGS
One thing that has puzzled me, when running java -verbose:gc -Xmx2040m looper
to check compressedRefsShift
the value is 0x0, indicating that compressed refs is still able to run (even though the base memory address remains at 0x80000000)
EDIT: That was from an unmodified JDK
0x80000000 is the base address of the java executable, which doesn't need much memory (i.e. I guess less than 8m (2048m - 2040m). Seems 2040m still fits in the available space. Try running java -verbose:gc -Xmx2040m -Xdump:java:events=vmstop
and look at the "Object memory" data in the "MEMINFO subcomponent dump routine" section to see the object heap addresses. Although when I try it I get a "compressedRefsShift" of 0x1.
zlinux Output
The heap size is identical to an x86_linux machine that loads the JDK at 0x40000000 (below)
So what is your concern? you requested -Xmx2040m
(2139095040 bytes) and got allocation [0x80030000, 0xff830000] (2139095040 bytes). It fits to one of halves of memory below 4G bar
My concern was the compressedRefsShift
was not producing an accurate value. However, running java -verbose:gc -Xmx2040m -Xdump:java:events=vmstop
shows that it is. The screenshots were to confirm the point
@pshipton @dmitripivkine https://github.com/M-Davies/openj9-openjdk-jdk11/commit/27d5060571a8905d1267664f4ac62537b9debd9c
00060000-00062000 r-xp 00000000 fd:01 14295230 /root/SharedDocker/openj9-openjdk-jdk11/build/linux-s390x-normal-server-release/images/jdk/bin/java
00062000-00063000 r--p 00001000 fd:01 14295230 /root/SharedDocker/openj9-openjdk-jdk11/build/linux-s390x-normal-server-release/images/jdk/bin/java
00063000-00064000 rw-p 00002000 fd:01 14295230 /root/SharedDocker/openj9-openjdk-jdk11/build/linux-s390x-normal-server-release/images/jdk/bin/java
Fix for JDK11, please see the test output above. I currently have a build running for JDK8 that should achieve the same result
JDK8 fix. https://github.com/M-Davies/openj9-openjdk-jdk8/commit/1fee500193b02437cfa5f35aa8bb3f84b421add4
00060000-00061000 r-xp 00000000 fd:01 15485194 /root/SharedDocker/openj9-openjdk-jdk8/build/linux-s390x-normal-server-release/images/j2sdk-image/bin/java
00061000-00062000 r--p 00000000 fd:01 15485194 /root/SharedDocker/openj9-openjdk-jdk8/build/linux-s390x-normal-server-release/images/j2sdk-image/bin/java
00062000-00063000 rw-p 00001000 fd:01 15485194 /root/SharedDocker/openj9-openjdk-jdk8/build/linux-s390x-normal-server-release/images/j2sdk-image/bin/java
I added a number of comments to the commit.
@pshipton Test successfull https://github.com/M-Davies/openj9-openjdk-jdk8/commit/14cfb367e5cfd3c0a2655cecdc3761e7ee999c22 If you're happy, I can put a PR in now?
@smlambert Same question as above ^^
Please open the PR, we can continue the review there.
Note I had created a number of new comments on that later commit M-Davies/openj9-openjdk-jdk8@14cfb36
My concern was the
compressedRefsShift
was not producing an accurate value. However, runningjava -verbose:gc -Xmx2040m -Xdump:java:events=vmstop
shows that it is. The screenshots were to confirm the point
I see. This is wrong indeed. The decision of Compressed Refs Shift is made based on position of most significant bit for heap top address. However I can not reproduce this problem. Calculation of Compressed Refs Shift is done in https://github.com/eclipse/omr/blob/e74d024550f3ae4472f23a984faa5c76c7e109b9/gc/base/Configuration.cpp#L268 I don't understand how it might work wrong. I need a reproducible test case for this
@dmitripivkine why is a shift of zero wrong for a heap 0x80030000 to 0xFF830000 ?
@dmitripivkine why is a shift of zero wrong for a heap 0x80030000 to 0xFF830000 ?
According https://github.com/eclipse/openj9/issues/7115#issuecomment-553464888 shift was set to 1, was not it?
According #7115 (comment) shift was set to 1, was not it?
Yes, but that is because the memory was allocated differently by the OS, from 0x82DC0000 to 0x1025C0000. I can duplicate it on my fyre machine, but it doesn't indicate a bug.
According #7115 (comment) shift was set to 1, was not it?
Yes, but that is because the memory was allocated differently by the OS, from 0x82DC0000 to 0x1025C0000. I can duplicate it on my fyre machine, but it doesn't indicate a bug.
Ok, thank you. I misunderstood obviously
@pshipton can you look at the other two requests on JDK11 and 13 too please? :)
@M-Davies yes, they are on the list. Please create a PR for https://github.com/ibmruntimes/openj9-openjdk-jdk as well.
I noticed that Java Launcher is loaded at address 0x80000000 on zLinux:
This splits virtual memory below 4GB bar to two halves ~1.8GB and ~1GB. The lack of contiguous memory prevents to run Compressed References JVM with heap larger then 1.8GB with most performant 0-bit shift.
The IBM Java 8 for instance is free from this problem and Java Launcher is loaded at 0x 00060000:
and can run with ~3GB heap below 4GB bar.
I believe something is missed in Java 11 build process to assign loading address for Java Launcher to be lower.