eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Test-extended.functional-JDK11-osx_x86-64_cmprssptrs testSoftMxDisclaimMemory Heap can't shrink #4090

Open pshipton opened 5 years ago

pshipton commented 5 years ago

https://ci.eclipse.org/openj9/job/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/20 testSoftMxDisclaimMemory_2 testSoftMxDisclaimMemory_LP4k_2

===============================================
Running test testSoftMxDisclaimMemory_2 ...
===============================================
testSoftMxDisclaimMemory_2 Start Time: Tue Dec 18 04:43:31 2018 Epoch Time (ms): 1545126211692
variation: Mode601
JVM_OPTIONS: -Xcompressedrefs -Xcompressedrefs -Xjit -Xgcpolicy:optthruput 
[IncludeExcludeTestAnnotationTransformer] [INFO] exclude file is /Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/resources/excludes/latest_exclude_11.txt
...
... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
...

[SoftmxAdvanceTest] [ERROR]     FAIL: Memory didn't release back to OS! Post memory size 3823534080 is less than or equal to previous memory size 3821236224
[SoftmxAdvanceTest] [WARN]  Generate Java core dump after forcing GC.
JVMJ9VM079W triggerOneOffDump(java) requires j9dmp29
JVMJ9VM079W triggerOneOffDump(heap) requires j9dmp29
JVMJ9VM079W triggerOneOffDump(system) requires j9dmp29
[SoftmxAdvanceTest] [ERROR]     FAIL: Heap can't shrink to the reset max heap size within 5 minutes!
[SoftmxAdvanceTest] [WARN]  Generate Java core dump after forcing GC.
JVMJ9VM079W triggerOneOffDump(java) requires j9dmp29
JVMJ9VM079W triggerOneOffDump(heap) requires j9dmp29
JVMJ9VM079W triggerOneOffDump(system) requires j9dmp29
[SoftmxAdvanceTest] [ERROR]     FAIL: Heap can't shrink to the reset max heap size within 5 minutes!
Exception in thread "MemoryMXBean notification dispatcher" java.lang.OutOfMemoryError: Java heap space
    at jdk.management/com.ibm.lang.management.internal.MemoryNotificationThread.processNotificationLoop(Native Method)
    at jdk.management/com.ibm.lang.management.internal.MemoryNotificationThread.run(MemoryNotificationThread.java:183)
FAILED: testDisclaimMemoryEffects(5)
java.lang.OutOfMemoryError: Java heap space
    at java.management/com.ibm.java.lang.management.internal.MemoryMXBeanImpl.getHeapMemoryUsageImpl(Native Method)
    at java.management/com.ibm.java.lang.management.internal.MemoryMXBeanImpl.getHeapMemoryUsage(MemoryMXBeanImpl.java:277)
    at j9vm.test.softmx.SoftmxAdvanceTest.runTestDisclaimMemoryEffects(SoftmxAdvanceTest.java:137)
    at j9vm.test.softmx.SoftmxAdvanceTest.testDisclaimMemoryEffects(SoftmxAdvanceTest.java:64)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
    at org.testng.internal.Invoker.invokeMethod(Invoker.java:580)
    at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:716)
    at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:988)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
    at org.testng.TestRunner.privateRun(TestRunner.java:648)
    at org.testng.TestRunner.run(TestRunner.java:505)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
    at org.testng.SuiteRunner.run(SuiteRunner.java:364)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
    at org.testng.TestNG.runSuites(TestNG.java:1049)
    at org.testng.TestNG.run(TestNG.java:1017)
    at org.testng.TestNG.privateMain(TestNG.java:1354)
    at org.testng.TestNG.main(TestNG.java:1323)
pshipton commented 5 years ago

@dmitripivkine

pshipton commented 5 years ago

Maybe related, failed in the same run j9vm.test.softmx.SoftmxAdvanceTest.testDisclaimMemoryEffects

java.lang.OutOfMemoryError: Java heap space
at java.management/com.ibm.java.lang.management.internal.MemoryMXBeanImpl.getHeapMemoryUsageImpl(Native Method)
at java.management/com.ibm.java.lang.management.internal.MemoryMXBeanImpl.getHeapMemoryUsage(MemoryMXBeanImpl.java:277)
at j9vm.test.softmx.SoftmxAdvanceTest.runTestDisclaimMemoryEffects(SoftmxAdvanceTest.java:137)
at j9vm.test.softmx.SoftmxAdvanceTest.testDisclaimMemoryEffects(SoftmxAdvanceTest.java:64)
dmitripivkine commented 5 years ago

Failure

04:45:56 [SoftmxAdvanceTest] [ERROR]    FAIL: Memory didn't release back to OS! Post memory size 3823534080 is less than or equal to previous memory size 3821236224

means released memory does not increase available free memory from OS (returned by ibmOSMBean.getFreePhysicalMemorySize()). This test is known for unreliability. To improve an investigation possibilities in case of failure this test suppose to generate set of cores:

        if (!isShrink) {
            logger.warn("   Generate Java core dump after forcing GC.");
            com.ibm.jvm.Dump.JavaDump();
            com.ibm.jvm.Dump.HeapDump();
            com.ibm.jvm.Dump.SystemDump();
        }

but it failed:

[SoftmxAdvanceTest] [WARN]  Generate Java core dump after forcing GC.
JVMJ9VM079W triggerOneOffDump(java) requires j9dmp29
JVMJ9VM079W triggerOneOffDump(heap) requires j9dmp29
JVMJ9VM079W triggerOneOffDump(system) requires j9dmp29

because of running test with -Xdump:none (why?!)

Theoretically it is possible to verify correctness of -Xsoftmx functionality in GC by looking to object heap size in javacore: it should shrink below provided Softmx value.

The OOM event can be related with fact that reduced (by -Xsoftmx) heap size is too small. We observed failures like this early days of this test.

@llxia @smlambert Is there a reason why we set -Xdump:none? The generation of javacore (at least) would be useful in this case. Also as we discussed before it is very useful to generate GC verbose logs for tests like this.

llxia commented 5 years ago

Is there a reason why we set -Xdump:none?

I did not see -Xdump:none is set in this test. @dmitripivkine where did you see this option being passed in?

Found earlier conversation #1470, GC log was added in #1504 and the checks for free memory return to OS was removed in #3304. But we only changed testSoftMxDisclaimMemory_GC. There are several testSoftMxDisclaimMemory* tests in playlist.xml.

I will add the GC log for now.

dmitripivkine commented 5 years ago
04:43:32 ===============================================
04:43:32 Running test testSoftMxDisclaimMemory_2 ...
04:43:32 ===============================================
04:43:32 testSoftMxDisclaimMemory_2 Start Time: Tue Dec 18 04:43:31 2018 Epoch Time (ms): 1545126211692
04:43:32 "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
04:43:35 JVMSHRC005I No shared class caches available
04:43:39 JVMSHRC005I No shared class caches available
04:43:39 cache cleanup done
04:43:39 variation: Mode601
04:43:39 JVM_OPTIONS: -Xcompressedrefs -Xcompressedrefs -Xjit -Xgcpolicy:optthruput 
04:43:39 { mkdir -p "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15451168575163/testSoftMxDisclaimMemory_2"; \
04:43:39    cd "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15451168575163/testSoftMxDisclaimMemory_2"; \
04:43:39    export DYLD_LIBRARY_PATH="/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/../../native-test-libs/:/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/../lib/compressedrefs:/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/../lib/j9vm":; \
04:43:39    "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdkbinary/j2sdk-image/bin/java" -Xcompressedrefs -Xcompressedrefs -Xjit -Xgcpolicy:optthruput  \

04:43:39    -Xdump:none

 -Xmx1024m -Xsoftmx512m -Xmn1m -XX:+DisclaimVirtualMemory \
04:43:39    -cp "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/resources:/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/lib/testng.jar:/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/lib/jcommander.jar:/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/functional/JLM_Tests/jlm_tests.jar" \
04:43:39    org.testng.TestNG -d "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15451168575163/testSoftMxDisclaimMemory_2" "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/functional/JLM_Tests/testng.xml" \
04:43:39    -testnames testSoftMxDisclaimMemory \
04:43:39    -groups level.extended \
04:43:39    -excludegroups d.*.osx_x86-64_cmprssptrs,d.*.arch.x86,d.*.os.osx,d.*.bits.64,d.*.generic-all; \
04:43:39    if [ $? -eq 0 ] ; then echo ""; echo "testSoftMxDisclaimMemory_2""_PASSED"; echo ""; cd /Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../..;  else echo ""; echo "testSoftMxDisclaimMemory_2""_FAILED"; echo ""; fi; } 2>&1 | tee -a "/Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15451168575163/TestTargetResult";
04:43:40 [IncludeExcludeTestAnnotationTransformer] [INFO] exclude file is /Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/resources/excludes/latest_exclude_11.txt
04:43:40 ...
04:43:40 ... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
04:43:40 ...

Is it applicable to this test or I missed something?

llxia commented 5 years ago

ah, i see. It is hardcoded -Xdump:none in the command. I am not sure what was the reason for setting it initially. I will remove it.

pshipton commented 5 years ago

https://ci.eclipse.org/openj9/job/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/23

===============================================
Running test testDefaultDisclaimMemory_2 ...
===============================================
testDefaultDisclaimMemory_2 Start Time: Fri Dec 21 05:46:21 2018 Epoch Time (ms): 1545389181200
variation: Mode601
JVM_OPTIONS:  -Xcompressedrefs -Xcompressedrefs -Xjit -Xgcpolicy:optthruput 
[IncludeExcludeTestAnnotationTransformer] [INFO] exclude file is /Users/jenkins/workspace/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/resources/excludes/latest_exclude_11.txt
...
... TestNG 6.14.2 by Cédric Beust (cedric@beust.com)
...

[SoftmxAdvanceTest] [ERROR]     FAIL: Memory didn't release back to OS! Post memory size 3916337152 is less than or equal to previous memory size 3915718656
[SoftmxAdvanceTest] [ERROR]     FAIL: Memory didn't release back to OS! Post memory size 3911798784 is less than or equal to previous memory size 3915284480
[SoftmxAdvanceTest] [ERROR]     FAIL: Memory didn't release back to OS! Post memory size 3911876608 is less than or equal to previous memory size 3910750208
[SoftmxAdvanceTest] [ERROR]     FAIL: Memory didn't release back to OS! Post memory size 3910643712 is less than or equal to previous memory size 3911876608
[SoftmxAdvanceTest] [ERROR]     FAIL: Memory didn't release back to OS! Post memory size 3909095424 is less than or equal to previous memory size 3910119424
FAILED: testDisclaimMemoryEffects(5)
pshipton commented 5 years ago

https://ci.eclipse.org/openj9/job/Test-extended.functional-JDK11-osx_x86-64_cmprssptrs/26

dmitripivkine commented 5 years ago
screen shot 2018-12-27 at 2 47 11 pm
dmitripivkine commented 5 years ago

Output from test:

06:50:26 [SoftmxAdvanceTest] [ERROR]    FAIL: Memory didn't release back to OS! Post memory size 3956506624 is less than or equal to previous memory size 3956842496
06:53:00 [SoftmxAdvanceTest] [ERROR]    FAIL: Memory didn't release back to OS! Post memory size 3955212288 is less than or equal to previous memory size 3955978240
06:55:34 [SoftmxAdvanceTest] [ERROR]    FAIL: Memory didn't release back to OS! Post memory size 3956686848 is less than or equal to previous memory size 3955216384
06:58:09 [SoftmxAdvanceTest] [ERROR]    FAIL: Memory didn't release back to OS! Post memory size 3955699712 is less than or equal to previous memory size 3956690944
07:00:42 [SoftmxAdvanceTest] [ERROR]    FAIL: Memory didn't release back to OS! Post memory size 3955036160 is less than or equal to previous memory size 3955699712
07:00:42 FAILED: testDisclaimMemoryEffects(5)

There is obvious from picture above that heap size has been reduced to 256MB after ~2-3 min. So GC functionality works as expected. However released (decommited) memory has not been reported as free by OS.

dmitripivkine commented 5 years ago

When heap shrinks GC provides Decommit request to Port Library. Port Library for OSX allows Decommit to call madvise() with option MADV_DONTNEED (portLibrary->portGlobals->vmemAdviseOSonFree is set to 1).

dmitripivkine commented 5 years ago

From OSX manual:

     The madvise() system call allows a process that has knowledge of its memory
     behavior to describe it to the system.  The advice passed in may be
     used by the system to alter its virtual memory paging strategy.  This
     advice may improve application and system performance.
...
     MADV_DONTNEED    Indicates that the application is not expecting to
                      access this address range soon.  This is used with
                      madvise() system call.

So OSX may (or may not) use this information. I believe test behaviour that expects released memory to be added to free is not accurate.

pshipton commented 5 years ago

I wonder if the query to get free memory (ibmOSMBean.getFreePhysicalMemorySize()) is working. @babsingh

babsingh commented 5 years ago

is ibmOSMBean.getFreePhysicalMemorySize() working?

it does have an implementation: OpenJ9 getFreePhysicalMemorySizeImpl and OMR omrvmem_get_available_physical_memory. I will verify if getFreePhysicalMemorySize functions properly.

pshipton commented 5 years ago

@DanHeidinga mentioned there was a proposal to fix this test to not expect the OS memory to decrease. We would also need to ensure there is a port library test to confirm madvise APIs continue to work on linux.

In the meantime, I think we should exclude this test on osx, which fails consistently, which is the only remaining failure in the jdk11 extended.functional test suite. @babsingh @llxia

babsingh commented 5 years ago

This failure is intermittent. https://github.com/eclipse/openj9/pull/3304 is a potential fix:

An expectation of test that decommited heap memory is going to be returned to OS is not reliable. It is a source of intermittent failures. From another side ibmMemoryMBean.getHeapMemoryUsage().getCommitted() always returns true amount of commited memory. So the main goal of the test to verify -Xsoftmx functionality.

https://github.com/eclipse/openj9/pull/3304 only amends SoftmxAdvanceTest_GC_Only. SoftmxAdvanceTest and SoftmxAdvanceTest_GC_Only are similar. https://github.com/eclipse/openj9/pull/3304 doesn't amend SoftmxAdvanceTest. The fix in https://github.com/eclipse/openj9/pull/3304 needs to be applied to SoftmxAdvanceTest.

@dmitripivkine thoughts?

dmitripivkine commented 5 years ago

There is a place for discussion about this test future: https://github.com/eclipse/openj9/issues/4475