eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

jdk_container_jdk8_0_FAILED jdk/internal/platform/docker/TestDockerMemoryMetrics.java RuntimeException: Kernel Memory limit not equal, expected : [104857600], got : [-1] #16460

Open JasonFengJ9 opened 1 year ago

JasonFengJ9 commented 1 year ago

Failure link

From an internal build(ub18-aarch64-4):

openjdk version "1.8.0_362"
IBM Semeru Runtime Open Edition (build 1.8.0_362-b05)
Eclipse OpenJ9 VM (build master-7cafd4be8, JRE 1.8.0 Linux aarch64-64-Bit Compressed References 20221210_513 (JIT enabled, AOT enabled)
OpenJ9   - 7cafd4be8
OMR      - 8ae31402f
JCL      - d9472a844 based on jdk8u362-b05)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

[2022-12-10T07:31:17.409Z] variation: NoOptions
[2022-12-10T07:31:17.409Z] JVM_OPTIONS:  

[2022-12-10T07:38:04.336Z] TEST: jdk/internal/platform/docker/TestDockerMemoryMetrics.java

[2022-12-10T07:38:04.338Z] STDERR:
[2022-12-10T07:38:04.338Z]  stdout: [[kernelmem, 100m]
[2022-12-10T07:38:04.338Z] Exception in thread "main" java.lang.RuntimeException: Kernel Memory limit not equal, expected : [104857600], got : [-1]
[2022-12-10T07:38:04.338Z]  at MetricsMemoryTester.testKernelMemoryLimit(MetricsMemoryTester.java:114)
[2022-12-10T07:38:04.338Z]  at MetricsMemoryTester.main(MetricsMemoryTester.java:38)
[2022-12-10T07:38:04.338Z] ];
[2022-12-10T07:38:04.338Z]  stderr: [WARNING: Specifying a kernel memory limit is deprecated and will be removed in a future release.
[2022-12-10T07:38:04.338Z] ]
[2022-12-10T07:38:04.338Z]  exitValue = 1
[2022-12-10T07:38:04.338Z] 
[2022-12-10T07:38:04.338Z] java.lang.RuntimeException: Expected to get exit value of [0]
[2022-12-10T07:38:04.338Z] 
[2022-12-10T07:38:04.338Z]  at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:396)
[2022-12-10T07:38:04.338Z]  at TestDockerMemoryMetrics.testKernelMemoryLimit(TestDockerMemoryMetrics.java:116)
[2022-12-10T07:38:04.338Z]  at TestDockerMemoryMetrics.main(TestDockerMemoryMetrics.java:58)
[2022-12-10T07:38:04.338Z]  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2022-12-10T07:38:04.338Z]  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2022-12-10T07:38:04.338Z]  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2022-12-10T07:38:04.338Z]  at java.lang.reflect.Method.invoke(Method.java:498)
[2022-12-10T07:38:04.338Z]  at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
[2022-12-10T07:38:04.338Z]  at java.lang.Thread.run(Thread.java:826)
[2022-12-10T07:38:04.338Z] 
[2022-12-10T07:38:04.338Z] JavaTest Message: Test threw exception: java.lang.RuntimeException

jdk_container_jdk8_0_FAILED

10x grinder - all failed

pshipton commented 1 year ago

jdk.internal.platform.Metrics.systemMetrics().getKernelMemoryLimit() is not returning the expected value. @babsingh fyi

babsingh commented 1 year ago

The test passes for JDK19 using the below cmd:

make test TEST="jtreg:test/jdk/jdk/internal/platform/docker/TestDockerMemoryMetrics.java" JTREG="JAVA_OPTIONS=--enable-preview -Dvm.continuations=true -Ddocker.support=true;VERBOSE=all"

The reported failure is JDK8 specific. But, the above cmd does not work in the JDK8 extension repo. @llxia Would you know the cmd to run the test in the JDK8 extension repo?

babsingh commented 1 year ago

Adapted the following instructions to run the failing test: https://github.com/eclipse-openj9/openj9/wiki/Reproducing-Test-Failures-Locally#run-specific-test-class-from-openjdk-tests-on-jdk11-linux-s390x-openj9-sdk.

babsingh commented 1 year ago

With OpenJ9 JDK8, TestDockerMemoryMetrics fails with the below error. The Docker image (registry.access.redhat.com/ubi8/ubi:latest) used by the test doesn't have the required GLIBC to support OpenJ9. So, this test should be disabled for OpenJ9 JDK8.

STDERR:
 stdout: [JVMJ9VM011W Unable to load j9gc29: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /jdk/jre/lib/amd64/default/libj9gc29.so)
JVMJ9VM011W Unable to load j9vrb29: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /jdk/jre/lib/amd64/default/libj9vrb29.so)
Unhandled exception
Type=Segmentation error vmState=0xffffffff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F373AC048D0 Handler2=00007F373A2551D0 InaccessibleAddress=0000000000000538
RDI=00007F373400DC40 RSI=000000000000000A RAX=00007F3734037D40 RBX=00007F3734037E18
RCX=0000000000000006 RDX=00007F373C234208 R8=00007F373976DF60 R9=0000000000000005
R10=0000000000000003 R11=00007F373BBB8F80 R12=0000000000000000 R13=00007F373C234208
R14=0000000000000003 R15=0000000000000003
RIP=00007F3738B36845 GS=0000 FS=0000 RSP=00007F373C234200
EFlags=0000000000010206 CS=0033 RBP=00007F373400DC40 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000538
xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1 49656e69756e6547 (f: 1970169216.000000, d: 3.823471e+45)
xmm2 0680810101000800 (f: 16779264.000000, d: 2.327561e-277)
xmm3 ffff000000000000 (f: 0.000000, d: -nan)
xmm4 6a2f3a72616a2e73 (f: 1634348672.000000, d: 3.059677e+203)
xmm5 6a2e65636a2f6269 (f: 1781490304.000000, d: 2.978134e+203)
xmm6 80046d8db7654321 (f: 3076866816.000000, d: -6.157814e-309)
xmm7 00007f3738223aa0 (f: 941767296.000000, d: 6.910750e-310)
xmm8 dddddddddddddddd (f: 3722305024.000000, d: -1.456816e+144)
xmm9 ff00ffffffffffff (f: 4294967296.000000, d: -5.829007e+303)
xmm10 2000202020202020 (f: 538976256.000000, d: 1.503368e-154)
xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/jdk/jre/lib/amd64/default/libj9jit29.so
Module_base_address=00007F373892E000
Target=2_90_20230110_000000 (Linux 4.15.0-200-generic)
CPU=amd64 (8 logical CPUs) (0x3ea73a000 RAM)
JVMDUMP039I Processing dump event "gpf", detail "" at 2023/01/11 16:32:46 - please wait.
JVMDUMP032I JVM requested System dump using '//core.20230111.163246.1.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.9.  Review the manual for the external program to find where the core dump is written and ensure the program does not truncate it.

JVMDUMP012E Error in System dump: The core file created by child process with pid = 9 was not found. Expected to find core file with name "//core"
JVMDUMP032I JVM requested Java dump using '//javacore.20230111.163246.1.0002.txt' in response to an event
JVMDUMP032I JVM requested Snap dump using '//Snap.20230111.163246.1.0003.trc' in response to an event
JVMDUMP009I Snap dump not available
JVMDUMP013I Processed dump event "gpf", detail "".
];
 stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
]
babsingh commented 1 year ago

The reported failure in https://github.com/eclipse-openj9/openj9/issues/16460#issue-1494255257 is seen with IBM-J9 (Semeru) JDK8. It is also seen with the RI's JDK8. The API is completely written in Java with no native dependency to the OMR cgroup functions: jdk/internal/platform/cgroupv1/Metrics.java#L449-L452.

This failure is not seen with OpenJ9 JDK19 since Metrics.getKernelMemoryLimit has been removed. The kernel memory limit metric has been deprecated by the Linux kernel. Related commit: https://github.com/ibmruntimes/openj9-openjdk-jdk19/commit/ce7bf63e67eff780630c114a9a6999f897c1e303.

Backporting the above JDK19 change to https://github.com/ibmruntimes/openj9-openjdk-jdk8 won't help since IBM-J9 (Semeru) uses a different JCL repo.

babsingh commented 1 year ago

Also, JDK8 jdk/internal/platform/cgroupv1/Metrics implementation only supports cgroup-v1. If the above test is run on a cgroup-v2 machine, then it will fail.

pshipton commented 1 year ago

The Docker image (registry.access.redhat.com/ubi8/ubi:latest) used by the test doesn't have the required GLIBC to support OpenJ9. So, this test should be disabled for OpenJ9 JDK8.

Proper OpenJ9 builds for jdk8 require glibc 2.12 since it's compiled on Cent 6. Not sure where the OpenJ9 build comes from that needs 2.27. Seems it was built on Ubuntu 18.04, which provides glibc 2.27.

babsingh commented 1 year ago

Proper OpenJ9 builds for jdk8 require glibc 2.12 since it's compiled on Cent 6. Not sure where the OpenJ9 build comes from that needs 2.27. Seems it was built on Ubuntu 18.04, which provides glibc 2.27.

Yes, the OpenJ9 JDK8 was built on my local build environment, which is on a Ubuntu 18.04 machine.

pshipton commented 1 year ago

It is also seen with the RI's JDK8.

It seems weird that a test provided with OpenJDK 8 doesn't work on OpenJDK 8. I wonder if the Adoptium test runs are running into the same failure? I don't see the test excluded for non-OpenJ9.

pshipton commented 1 year ago

If the above test is run on a cgroup-v2 machine, then it will fail.

Is that's what's going on? The test requirement is to run on a cgroup-v1 machine?

babsingh commented 1 year ago

Actions to take:

Is that's what's going on? The test requirement is to run on a cgroup-v1 machine?

pshipton commented 1 year ago

@llxia is this test running/passing on Adoptium builds?

llxia commented 1 year ago

I wonder if the Adoptium test runs are running into the same failure? I don't see the test excluded for non-OpenJ9.

It seems that Adoptium does not have the machine for aarch64, but I found the build on linux x86 (failed): https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_dev.openjdk_x86-64_linux/10/consoleFull

00:06:54.192  STDERR:
00:06:54.192   stdout: [[kernelmem, 100m]
00:06:54.192  Exception in thread "main" java.lang.RuntimeException: Kernel Memory limit not equal, expected : [104857600], got : [-1]
00:06:54.192    at MetricsMemoryTester.testKernelMemoryLimit(MetricsMemoryTester.java:114)
00:06:54.192    at MetricsMemoryTester.main(MetricsMemoryTester.java:38)
00:06:54.192  ];
00:06:54.192   stderr: [WARNING: Specifying a kernel memory limit is deprecated and will be removed in a future release.
00:06:54.192  WARNING: You specified a kernel memory limit on a kernel older than 4.0. Kernel memory limits are experimental on older kernels, it won't work as expected and can cause your system to be unstable.
00:06:54.192  ]
00:06:54.192   exitValue = 1
00:06:54.192  
00:06:54.192  java.lang.RuntimeException: Expected to get exit value of [0]
00:06:54.192  
00:06:54.192    at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:396)
00:06:54.192    at TestDockerMemoryMetrics.testKernelMemoryLimit(TestDockerMemoryMetrics.java:116)
00:06:54.192    at TestDockerMemoryMetrics.main(TestDockerMemoryMetrics.java:58)
00:06:54.192    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
00:06:54.192    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
00:06:54.192    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
00:06:54.192    at java.lang.reflect.Method.invoke(Method.java:498)
00:06:54.192    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
00:06:54.192    at java.lang.Thread.run(Thread.java:750)
00:06:54.192  
00:06:54.192  JavaTest Message: Test threw exception: java.lang.RuntimeException
00:06:54.192  JavaTest Message: shutting down test
00:06:54.192  
00:06:54.192  
00:06:54.192  TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Expected to get exit value of [0]
00:06:54.192  --------------------------------------------------
00:07:27.264  Test results: passed: 5; failed: 1
pshipton commented 1 year ago

Seems like the test isn't running at Adoptium since there isn't a machine to run it on. https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_dev.openjdk_aarch64_linux/

is the test requirement is to run on a cgroup-v1 machine?

I'm asking if the docker container we are running the test on is cgroup-v2, and that could explain the failure?

I don't think we should be backporting anything from jdk19. We will provide what OpenJDK 8 provides. If the test/code doesn't work, it can be excluded until OpenJDK fixes it.

babsingh commented 1 year ago

I'm asking if the docker container we are running the test on is cgroup-v2, and that could explain the failure?

The test will fail in both Docker containers:

java.lang.RuntimeException: Expected to get exit value of [0]

at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:396)
at TestDockerMemoryMetrics.testMemoryLimit(TestDockerMemoryMetrics.java:81)
at TestDockerMemoryMetrics.main(TestDockerMemoryMetrics.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:508)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
at java.lang.Thread.run(Thread.java:825)

JavaTest Message: Test threw exception: java.lang.RuntimeException JavaTest Message: shutting down test

babsingh commented 1 year ago

I don't think we should be backporting anything from jdk19. We will provide what OpenJDK 8 provides. If the test/code doesn't work, it can be excluded until OpenJDK fixes it.

Opened https://github.com/adoptium/aqa-tests/pull/4242 to exclude the test until OpenJDK fixes the issue.

smlambert commented 1 year ago

Was an upstream (OpenJDK) bug raised for this test case issue? I do not see it referenced anywhere but see the comment

it can be excluded until OpenJDK fixes it

If it exists, we can use that issue number in the excludes PR in https://github.com/adoptium/aqa-tests/pull/4242

babsingh commented 1 year ago

https://github.com/adoptium/aqa-tests/pull/4242 has been merged. @pshipton Can this issue be closed, or will it stay open until OpenJDK fixes the issue?

re https://github.com/eclipse-openj9/openj9/issues/16460#issuecomment-1382438843: Responded in https://github.com/adoptium/aqa-tests/pull/4242#issuecomment-1382444134.

pshipton commented 1 year ago

Since the test is excluded, this issue (or another) needs to remain open to track the exclude.

JasonFengJ9 commented 1 year ago

JDK8 aarch64_linux 0.36

[2023-01-23T17:54:04.587Z] STDERR:
[2023-01-23T17:54:04.587Z]  stdout: [[kernelmem, 100m]
[2023-01-23T17:54:04.587Z] Exception in thread "main" java.lang.RuntimeException: Kernel Memory limit not equal, expected : [104857600], got : [-1]
[2023-01-23T17:54:04.587Z]  at MetricsMemoryTester.testKernelMemoryLimit(MetricsMemoryTester.java:114)
[2023-01-23T17:54:04.587Z]  at MetricsMemoryTester.main(MetricsMemoryTester.java:38)
[2023-01-23T17:54:04.587Z] ];
[2023-01-23T17:54:04.587Z]  stderr: [WARNING: Specifying a kernel memory limit is deprecated and will be removed in a future release.
[2023-01-23T17:54:04.587Z] ]
[2023-01-23T17:54:04.587Z]  exitValue = 1
[2023-01-23T17:54:04.587Z] 
[2023-01-23T17:54:04.587Z] java.lang.RuntimeException: Expected to get exit value of [0]
[2023-01-23T17:54:04.587Z] 
[2023-01-23T17:54:04.587Z]  at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:396)
[2023-01-23T17:54:04.587Z]  at TestDockerMemoryMetrics.testKernelMemoryLimit(TestDockerMemoryMetrics.java:116)
[2023-01-23T17:54:04.587Z]  at TestDockerMemoryMetrics.main(TestDockerMemoryMetrics.java:58)
[2023-01-23T17:54:04.587Z]  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2023-01-23T17:54:04.587Z]  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2023-01-23T17:54:04.587Z]  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2023-01-23T17:54:04.587Z]  at java.lang.reflect.Method.invoke(Method.java:498)
[2023-01-23T17:54:04.587Z]  at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
[2023-01-23T17:54:04.587Z]  at java.lang.Thread.run(Thread.java:826)
[2023-01-23T17:54:04.587Z] 
[2023-01-23T17:54:04.587Z] JavaTest Message: Test threw exception: java.lang.RuntimeException

It seems 0.36 test branch w/o the excludes.

babsingh commented 1 year ago

https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk8_j9_dev.openjdk_aarch64_linux/10/consoleFull

The above build uses aqa-test's v0.9.6-release branch, and this branch does not contain the changes from https://github.com/adoptium/aqa-tests/pull/4242. @llxia @JasonFengJ9 @pshipton Do we need to backport this PR to aqa-test's v0.9.6-release branch?

pshipton commented 1 year ago

It would be nice, but not critical.

babsingh commented 1 year ago

backport this PR to aqa-test's v0.9.6-release branch?

PR to backport the changes: https://github.com/adoptium/aqa-tests/pull/4275

babsingh commented 1 year ago

@pshipton @DanHeidinga The issue resides in OpenJDK8's extension repo, and it is seen in both RI and OpenJ9. Key details are mentioned from https://github.com/eclipse-openj9/openj9/issues/16460#issuecomment-1382309251 and onwards. We have disabled the test and await OpenJDK to fix the issue in JDK8.

pshipton commented 1 year ago

It seems maybe we are trying to run the test when not in the correct environment where it can pass. @llxia I've added the test label, and moving this forward.

babsingh commented 1 year ago

There are two issues https://github.com/eclipse-openj9/openj9/issues/16460#issuecomment-1382336125:

  1. On cgroup v1 machines, a functional issue will be seen related to Metrics.getKernelMemoryLimit because the OpenJDK impl in the extension repo uses the deprecated Linux kernel memory limit metric https://github.com/eclipse-openj9/openj9/issues/16460#issuecomment-1382309251. This requires the OpenJDK fix to be backported to JDK8. The fix just removes the API and test code related to the deprecated metrics.

  2. On cgroup v2 machines, the Metrics API won't work in JDK8 because there is only support for cgroup v1. cgroup v2 support was introduced in JDK11. In JDK8, tests related to the Metrics API should only be run on cgroup v1 machines. They will fail on a cgroup v2 machine due to lack of support.