eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

JDK19: GC regression test failing with OOM #15238

Closed tajila closed 2 years ago

tajila commented 2 years ago
2:55:28  Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
02:55:28  Test start time: 2022/06/06 03:55:20 Atlantic Standard Time
02:55:28  Running command: "/home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
02:55:28  Time spent starting: 4 milliseconds
02:57:35  Time spent executing: 127174 milliseconds
02:57:35  Test result: FAILED
02:57:35  Output from test:
02:57:35   [OUT] Calibrating live object array size...
02:57:35   [OUT] Set live object array size to 1000
02:57:35   [OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 1000 50 -
02:57:35   [OUT] Allocating array (iteration 1)
02:57:35   [OUT] i=0
02:57:35   [OUT] Allocating array (iteration 2)
02:57:35   [OUT] i=0
02:57:35   [OUT] Allocating array (iteration 3)
02:57:35   [OUT] i=0
02:57:35   [OUT] Allocating array (iteration 4)
02:57:35   [OUT] i=0
02:57:35   [OUT] Allocating array (iteration 5)
02:57:35   [OUT] i=0
02:57:35   [OUT] Allocating array (iteration 6)
02:57:35   [OUT] i=0
02:57:35   [OUT] Allocating array (iteration 7)
02:57:35   [OUT] i=0
02:57:35   [ERR] JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2022/06/06 03:55:32 - please wait.
02:57:35   [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/core.20220606.035532.15400.0001.dmp' in response to an event
02:57:35   [ERR] JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.15435.
02:57:35   [ERR] 
02:57:35   [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/core.20220606.035532.15400.0001.dmp
02:57:35   [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/core.20220606.035532.15400.0002.dmp' in response to an event
02:57:35   [ERR] JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.15442.
02:57:35   [ERR] 
02:57:35   [ERR] JVMDUMP010I System dump written to /home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/core.20220606.035532.15400.0002.dmp
02:57:35   [ERR] JVMDUMP032I JVM requested Heap dump using '/home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/heapdump.20220606.035532.15400.0003.phd' in response to an event
02:57:35   [ERR] JVMDUMP010I Heap dump written to /home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/heapdump.20220606.035532.15400.0003.phd
02:57:35   [ERR] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/javacore.20220606.035532.15400.0004.txt' in response to an event
02:57:35   [ERR] JVMDUMP010I Java dump written to /home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/javacore.20220606.035532.15400.0004.txt
02:57:35   [ERR] JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/Snap.20220606.035532.15400.0005.trc' in response to an event
02:57:35   [ERR] JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2022/06/06 03:56:03 - please wait.
02:57:35   [ERR] JVMDUMP010I Snap dump written to /home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/Snap.20220606.035532.15400.0005.trc
02:57:35   [ERR] JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
02:57:35   [ERR] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdknext_j9_sanity.functional_x86-64_linux_OpenJDK_testList_0/aqa-tests/TKG/output_16544977725873/cmdLineTester_GCRegressionTests_1/core.20220606.035603.15400.0006.dmp' in response to an event
02:57:35   [ERR] JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.15465.
02:57:35   [ERR] 
02:57:35   [ERR] JVMDUMP010I System dump written
tajila commented 2 years ago

This test relies on SoftReference behaviour to determine when there is memory pressure:

private static void _gatherUnloadStats()
    {
        long start = System.currentTimeMillis();
        int count = 0;
        SoftReference classes = new SoftReference(new Object[MAX_LOADERS_ATTEMPTED]);

        while (count < MAX_LOADERS_ATTEMPTED)
        {
            try
            {
                Object temp = new TestClassLoader(bytes).newInstance();
                Object[] classesArray = (Object[])classes.get();
                if (classesArray == null) {
                    break;
                }
                classesArray[count] = temp;
                classesArray = null;
            }
            catch (ClassNotFoundException e)
            {
                e.printStackTrace();
                System.exit(1);
            }
            catch (OutOfMemoryError e) {
                e.printStackTrace();
                System.exit(2);
            }
            count++;
        }

this is used to determine how many classloaders can be created.

tajila commented 2 years ago

I see the following in GC logs

<concurrent-kickoff id="2494" timestamp="2022-06-06T13:31:54.640">
  <kickoff reason="threshold reached" targetBytes="5644976" thresholdFreeBytes="778621"  remainingFree="363856" tenureFreeBytes="8704" nurseryFreeBytes="363856" />
</concurrent-kickoff>
<exclusive-start id="2495" timestamp="2022-06-06T13:31:54.640" intervalms="20.456">
  <response-info timems="0.045" idlems="0.045" threads="0" lastid="00007FD258003C00" lastname="Finalizer thread" />
</exclusive-start>
<af-start id="2496" threadId="00007FD258004610" totalBytesRequested="177496" timestamp="2022-06-06T13:31:54.640" intervalms="64.103" type="nursery" />
<percolate-collect id="2497" from="nursery" to="global" reason="insufficient remaining tenure space" timestamp="2022-06-06T13:31:54.640"/>
<concurrent-aborted id="2498" timestamp="2022-06-06T13:31:54.640">
  <reason value="insufficient progress made" />
</concurrent-aborted>
<cycle-end id="2499" type="global" contextid="2472" timestamp="2022-06-06T13:31:54.640" />
<cycle-start id="2500" type="global" contextid="0" timestamp="2022-06-06T13:31:54.640" intervalms="30.343" />
<gc-start id="2501" type="global" contextid="2500" timestamp="2022-06-06T13:31:54.640">
  <mem-info id="2502" free="364368" total="8388608" percent="4">
    <mem type="nursery" free="355664" total="2097152" percent="16">
      <mem type="allocate" free="355664" total="1376256" percent="25" />
      <mem type="survivor" free="0" total="720896" percent="0" />
    </mem>
    <mem type="tenure" free="8704" total="6291456" percent="0">
      <mem type="soa" free="8704" total="6291456" percent="0" />
      <mem type="loa" free="0" total="0" percent="0" />
    </mem>
    <pending-finalizers system="0" default="0" reference="6611" classloader="40" />
    <remembered-set count="1017" />
  </mem-info>
</gc-start>
<allocation-stats totalBytes="8312" >
  <allocated-bytes non-tlh="120" tlh="8192" />
  <largest-consumer threadName="main" threadId="00007FD2A00CC200" bytes="8312" />
</allocation-stats>
<gc-op id="2503" type="mark" timems="12.247" contextid="2500" timestamp="2022-06-06T13:31:54.653">
  <trace-info objectcount="105678" scancount="99827" scanbytes="6724032" />
  <ownableSynchronizers candidates="9" cleared="0" />
  <references type="soft" candidates="65" cleared="1" enqueued="0" dynamicThreshold="0" maxThreshold="32" />
  <references type="weak" candidates="167" cleared="1" enqueued="1" />
  <references type="phantom" candidates="15" cleared="0" enqueued="0" />
  <stringconstants candidates="2083" cleared="0"  />
  <object-monitors candidates="2" cleared="0"  />
</gc-op>
<gc-op id="2504" type="classunload" timems="0.324" contextid="2500" timestamp="2022-06-06T13:31:54.653">
  <classunload-info classloadercandidates="3220" classloadersunloaded="1" classesunloaded="2" anonymousclassesunloaded="1" quiescems="0.000" setupms="0.311" scanms="0.012" postms="0.001" />
</gc-op>
<gc-op id="2505" type="sweep" timems="0.598" contextid="2500" timestamp="2022-06-06T13:31:54.654" />
<mem-info id="2506" free="373568" total="8388608" percent="4">
  <mem type="nursery" free="364712" total="2097152" percent="17">
    <mem type="allocate" free="364712" total="1376256" percent="26" />
    <mem type="survivor" free="0" total="720896" percent="0" />
  </mem>
  <mem type="tenure" free="8856" total="6291456" percent="0" micro-fragmented="0" macro-fragmented="0">
    <mem type="soa" free="8856" total="6291456" percent="0" />
    <mem type="loa" free="0" total="0" percent="0" />
  </mem>
  <pending-finalizers system="0" default="0" reference="6612" classloader="41" />
  <remembered-set count="1014" />
</mem-info>
<gc-op id="2507" type="compact" timems="10.496" contextid="2500" timestamp="2022-06-06T13:31:54.664">
  <compact-info movecount="15865" movebytes="1070720" reason="low free space (less than 4%)" />
</gc-op>
<gc-end id="2508" type="global" contextid="2500" durationms="24.392" usertimems="68.258" systemtimems="0.392" stalltimems="4.186" timestamp="2022-06-06T13:31:54.665" activeThreads="4">
  <mem-info id="2509" free="375816" total="8388608" percent="4">
    <mem type="nursery" free="365904" total="2097152" percent="17">
      <mem type="allocate" free="365904" total="1376256" percent="26" />
      <mem type="survivor" free="0" total="720896" percent="0" />
    </mem>
    <mem type="tenure" free="9912" total="6291456" percent="0">
      <mem type="soa" free="9912" total="6291456" percent="0" />
      <mem type="loa" free="0" total="0" percent="0" />
    </mem>
    <pending-finalizers system="0" default="0" reference="6612" classloader="41" />
    <remembered-set count="1014" />
  </mem-info>
</gc-end>
dmitripivkine commented 2 years ago

I guess it is known extended object liveness issue with JIT (unnecessary store of object pointer in O-slot makes it strong root)

dmitripivkine commented 2 years ago

I guess it is known extended object liveness issue with JIT (unnecessary store of object pointer in O-slot makes it strong root)

I am not sure any more but we can try to run with -Xjit:enableAggressiveLiveness The issue I referred to is https://github.com/eclipse-openj9/openj9/issues/9969 (different test actually)

tajila commented 2 years ago

I dont think its jit related. I tried it with the following options:

"/home/tobi/openj9-openjdk-jdk/build/linux-x86_64-server-release/images/jdk/bin//../bin/java"  -Xgcpolicy:gencon  -Xnocompressedrefs  -Xint  -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /home/tobi/openj9-openjdk-jdk/openj9/test/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
tajila commented 2 years ago

@dmitripivkine I added the -Xtrace:print=j9vm.45 tracepoint to print if classloaders are being freed.

On jdk11 I see a bunch of

13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A8423D928)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A8423DA18)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A8423DB08)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84232E28)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84232F18)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233008)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A842330F8)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A842331E8)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A842332D8)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A842333C8)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A842334B8)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A842335A8)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233698)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233788)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233878)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233968)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233A58)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233B48)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A84233C38)
13:57:19.582 0x1d0c00            j9vm.45       > freeClassLoader(00007F6A8409BCC8)

On JDK 19, that trace point never shows up.

Is there a way to determine if finalization is actually happening?

dmitripivkine commented 2 years ago

Ah, I believe Finalization has been disabled for Java 18 and up:

#if JAVA_SPEC_VERSION >= 18
    if (J9_ARE_ANY_BITS_SET(javaVM->extendedRuntimeFlags2, J9_EXTENDED_RUNTIME2_DISABLE_FINALIZATION)) {
        /* Finalization is disabled */
    } else
#endif /* JAVA_SPEC_VERSION >= 18 */
tajila commented 2 years ago

Yes, I tried with --finalization=enabled as well and the results were the same.

tajila commented 2 years ago

Note: the default behaviour is still enabled

        /* JEP 421: Deprecate Finalization for Removal - Expecting values: "enabled" and "disabled".
         * The default value for now will be "enabled", i.e., finalization is enabled by default.
         * Other values will cause an error, additional values may be added in the future.
         */
        IDATA argIndex = FIND_AND_CONSUME_ARG(STARTSWITH_MATCH, VMOPT_DISABLE_FINALIZATION, NULL);
        if (argIndex >= 0) {
            PORT_ACCESS_FROM_JAVAVM(vm);
            char *optionArg = NULL;
            GET_OPTION_VALUE(argIndex, '=', &optionArg);
            if (NULL != optionArg) {
                if (strcmp(optionArg, "disabled") == 0) {
                    vm->extendedRuntimeFlags2 |= J9_EXTENDED_RUNTIME2_DISABLE_FINALIZATION;
                } else if (strcmp(optionArg, "enabled") == 0) {
                    /* do nothing as finalization is enabled by default */
pshipton commented 2 years ago

Note: the default behaviour is still enabled

As it should be. https://openjdk.java.net/jeps/421 - "Finalization remains enabled by default in JDK 18" So far, no JEPs for JDK 19 change the default behaviour.

pshipton commented 2 years ago

Fixed by https://github.com/ibmruntimes/openj9-openjdk-jdk19/pull/9 The tests are passing now, https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/1020/