eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

core files found from cmdLineTester_GCRegressionTests #18048

Open pshipton opened 1 year ago

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/372/ - ub20-390-2 cmdLineTester_GCRegressionTests_1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/372/functional_test_output.tar.gz

Testing: Ensure no core files have been produced by the preceding tests
Test start time: 2023/08/30 15:01:10 Coordinated Universal Time
Running command: sh -c ls
Time spent starting: 18 milliseconds
Time spent executing: 41 milliseconds
Test result: FAILED
Output from test:
 [OUT] core.20230830.145839.1792369.0001.dmp
 [OUT] core.20230830.145839.1792369.0002.dmp
 [OUT] foo.%001.log
 [OUT] foo.001.log
 [OUT] foo.002.log
 [OUT] foo.#.log
 [OUT] foo.log
 [OUT] foo.log.001
 [OUT] foo.log.002
 [OUT] foo.log.003
 [OUT] heapdump.20230830.145839.1792369.0003.phd
 [OUT] javacore.20230830.145839.1792369.0004.txt
 [OUT] Snap.20230830.145839.1792369.0005.trc
 [OUT] static_temp.txt
 [OUT] verbosegc.20230830.145454.1791981.txt
 [OUT] verbosegc.20230830.145532.1792214.txt
>> Success condition was found: [Output match: ]
>> Failure condition was found: [Output match: core.*]
dmitripivkine commented 1 year ago

The reason cores were created is OOM:

1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received

from test:

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2023/08/30 14:58:23 Coordinated Universal Time
Running command: "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 94 milliseconds
Time spent executing: 98229 milliseconds
Test result: PASSED
pshipton commented 2 months ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/264 - rh8-390-1 cmdLineTester_GCRegressionTests_1

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/264/functional_test_output.tar.gz

20:31:35  Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
20:31:35  Test start time: 2024/07/10 20:31:32 Eastern Standard Time
20:31:35  Running command: "/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
20:31:35  Time spent starting: 4 milliseconds
20:32:50  Time spent executing: 66125 milliseconds
20:32:50  Test result: PASSED
20:32:50  
20:32:50  Testing: Ensure no core files have been produced by the preceding tests
20:32:50  Test start time: 2024/07/10 20:32:38 Eastern Standard Time
20:32:50  Running command: sh -c ls
20:32:50  Time spent starting: 6 milliseconds
20:32:50  Time spent executing: 15 milliseconds
20:32:50  Test result: FAILED
20:32:50  Output from test:
20:32:50   [OUT] core.20240710.202836.1630732.0001.dmp
20:32:50   [OUT] core.20240710.202836.1630732.0002.dmp
20:32:50   [OUT] foo.%001.log
20:32:50   [OUT] foo.001.log
20:32:50   [OUT] foo.002.log
20:32:50   [OUT] foo.#.log
20:32:50   [OUT] foo.log
20:32:50   [OUT] foo.log.001
20:32:50   [OUT] foo.log.002
20:32:50   [OUT] foo.log.003
20:32:50   [OUT] heapdump.20240710.202836.1630732.0003.phd
20:32:50   [OUT] javacore.20240710.202836.1630732.0004.txt
20:32:50   [OUT] Snap.20240710.202836.1630732.0005.trc
20:32:50   [OUT] static_temp.txt
20:32:50   [OUT] verbosegc.20240710.202725.1630495.txt
20:32:50   [OUT] verbosegc.20240710.202737.1630713.txt
20:32:50  >> Success condition was found: [Output match: ]
20:32:50  >> Failure condition was found: [Output match: core.*]
1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received
amicic commented 2 months ago

At the first glance, just genuinely OOM. Last Global GC expectedly did a compact, but still no help.

00:28:33.689208946  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:33.689618464  0x0 j9mm.474             Event       GlobalGC start: globalcount=479
00:28:33.722677114  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:33.722693688  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:33.723049303  0x0 j9mm.474             Event       GlobalGC start: globalcount=480
00:28:33.801943391  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:33.801966460  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.116965255  0x0 j9mm.474             Event       GlobalGC start: globalcount=482
00:28:35.802519107  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:35.802544021  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.803401100  0x0 j9mm.474             Event       GlobalGC start: globalcount=483
00:28:35.886767952  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:35.886797807  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.986958403  0x0 j9mm.474             Event       GlobalGC start: globalcount=485
00:28:36.068119797  0x0 j9mm.90              Event       GlobalGC collect complete
00:28:36.068143276  0x0 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:36.073785786  0x0 j9mm.474             Event       GlobalGC start: globalcount=486
00:28:36.167755330 *0x3FF88002400 j9mm.90              Event       GlobalGC collect complete
00:28:36.167755385 *0x0 j9mm.90              Event       GlobalGC collect complete
00:28:36.167790932  0x3FF88002400 j9mm.475             Event       GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608

But then, for Balanced GC, per design, we have to free up a whole region, to be able to proceed with allocation.

This (also from traces) indicates we might have some free memory

Tenure: 7707034 bytes, Free Tenure 0 bytes ,

(gdb) p 8388608 - 7707034
$1 = 681574

This is enough for one region (524288), but perhaps somewhat scattered/fragmented so actually not enough to free one. Parallel Compact is indeed non-perfect, and this can happened to some degree (can't tell exactly what the imperfection factor is).

This is another indication of large live set size (this might exclude leafs/primitive arrays, hence a bit smaller than the real one):

4 threads scanned  7506528 bytes.

Long story short, it could be that just the heap of 8M is occasionally insufficient.

Explicitely specified (both from console logs and from traces): -Xmx8m

pshipton commented 2 weeks ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/60 - ub20-390-6

cmdLineTester_GCRegressionTests_1

23:54:03  Testing: Ensure no core files have been produced by the preceding tests
23:54:03  Test start time: 2024/09/03 03:53:56 Coordinated Universal Time
23:54:03  Running command: sh -c ls
23:54:03  Time spent starting: 35 milliseconds
23:54:03  Time spent executing: 54 milliseconds
23:54:03  Test result: FAILED
23:54:03  Output from test:
23:54:03   [OUT] core.20240903.035256.4069621.0001.dmp
23:54:03   [OUT] core.20240903.035256.4069621.0002.dmp
23:54:03   [OUT] foo.%001.log
23:54:03   [OUT] foo.001.log
23:54:03   [OUT] foo.002.log
23:54:03   [OUT] foo.#.log
23:54:03   [OUT] foo.log
23:54:03   [OUT] foo.log.001
23:54:03   [OUT] foo.log.002
23:54:03   [OUT] foo.log.003
23:54:03   [OUT] heapdump.20240903.035256.4069621.0003.phd
23:54:03   [OUT] javacore.20240903.035256.4069621.0004.txt
23:54:03   [OUT] Snap.20240903.035256.4069621.0005.trc
23:54:03   [OUT] static_temp.txt
23:54:03   [OUT] verbosegc.20240903.034815.4069165.txt
23:54:03   [OUT] verbosegc.20240903.034914.4069415.txt

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/60/functional_test_output.tar.gz

dmitripivkine commented 2 weeks ago

I am confused with this one. Cores are thrown at OOM event:

1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received

by test with this command line:

1CICMDLINE     /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders -cp /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -

The console output of this test:

23:52:47  Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
23:52:47  Test start time: 2024/09/03 03:52:42 Coordinated Universal Time
23:52:47  Running command: "/home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:balanced -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
23:52:47  Time spent starting: 16 milliseconds
23:54:03  Time spent executing: 74461 milliseconds
23:54:03  Test result: PASSED

Test filters out filter=java/lang/OutOfMemoryError but files are generated regardless. The thread produced OOM is "main":

03:52:56.379613292  0x3FF9816C200 j9mm.100             Event       J9AllocateObject() returning NULL! 64 bytes requested for object of class 0x3ff98241400 from memory space 'Flat' id=0x3ff980ecd00

!stack 0x3ff9816c200    !j9vmthread 0x3ff9816c200   !j9thread 0x3ff9801e160 tid 0x3e18f6 (4069622) // (main)

@pshipton How do you think it has failed?

pshipton commented 2 weeks ago

I think the filter means you only get java/lang/OutOfMemoryError. The java callstack

3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/lang/ClassLoader.defineClassImpl(Native Method)
4XESTACKTRACE                at java/lang/ClassLoader.defineClassInternal(ClassLoader.java:470(Compiled Code))
4XESTACKTRACE                at java/lang/ClassLoader.defineClass(ClassLoader.java:431(Compiled Code))
4XESTACKTRACE                at java/lang/ClassLoader.defineClass(ClassLoader.java:383(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoader.newInstance(TestClassLoader.java:37(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoaderMain.classLoaderAllocate(TestClassLoaderMain.java:89(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoaderMain.test(TestClassLoaderMain.java:74(Compiled Code))
4XESTACKTRACE                at com/ibm/tests/garbagecollector/TestClassLoaderMain.main(TestClassLoaderMain.java:139)

The success criteria for the sub-test is Successful test run! or Cannot load library required by: -Xjit so presumably one of these was already printed out. I guess it must be the second one because I don't see how the first could be printed yet.

pshipton commented 2 weeks ago

There is a way to always show the output from the subtest by running the cmdlinetester with -verbose, if we want to modify the test to do that. Seems this is a rare occurrence, not sure it's worth it, but it could help clear up the mystery.