Open pshipton opened 1 year ago
The reason cores were created is OOM:
1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received
from test:
Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2023/08/30 14:58:23 Coordinated Universal Time
Running command: "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/openjdkbinary/j2sdk-image/bin/java" -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -cp /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 94 milliseconds
Time spent executing: 98229 milliseconds
Test result: PASSED
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/264 - rh8-390-1 cmdLineTester_GCRegressionTests_1
20:31:35 Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
20:31:35 Test start time: 2024/07/10 20:31:32 Eastern Standard Time
20:31:35 Running command: "/home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/jdkbinary/j2sdk-image/bin/java" -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
20:31:35 Time spent starting: 4 milliseconds
20:32:50 Time spent executing: 66125 milliseconds
20:32:50 Test result: PASSED
20:32:50
20:32:50 Testing: Ensure no core files have been produced by the preceding tests
20:32:50 Test start time: 2024/07/10 20:32:38 Eastern Standard Time
20:32:50 Running command: sh -c ls
20:32:50 Time spent starting: 6 milliseconds
20:32:50 Time spent executing: 15 milliseconds
20:32:50 Test result: FAILED
20:32:50 Output from test:
20:32:50 [OUT] core.20240710.202836.1630732.0001.dmp
20:32:50 [OUT] core.20240710.202836.1630732.0002.dmp
20:32:50 [OUT] foo.%001.log
20:32:50 [OUT] foo.001.log
20:32:50 [OUT] foo.002.log
20:32:50 [OUT] foo.#.log
20:32:50 [OUT] foo.log
20:32:50 [OUT] foo.log.001
20:32:50 [OUT] foo.log.002
20:32:50 [OUT] foo.log.003
20:32:50 [OUT] heapdump.20240710.202836.1630732.0003.phd
20:32:50 [OUT] javacore.20240710.202836.1630732.0004.txt
20:32:50 [OUT] Snap.20240710.202836.1630732.0005.trc
20:32:50 [OUT] static_temp.txt
20:32:50 [OUT] verbosegc.20240710.202725.1630495.txt
20:32:50 [OUT] verbosegc.20240710.202737.1630713.txt
20:32:50 >> Success condition was found: [Output match: ]
20:32:50 >> Failure condition was found: [Output match: core.*]
1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received
At the first glance, just genuinely OOM. Last Global GC expectedly did a compact, but still no help.
00:28:33.689208946 0x0 j9mm.90 Event GlobalGC collect complete
00:28:33.689618464 0x0 j9mm.474 Event GlobalGC start: globalcount=479
00:28:33.722677114 0x0 j9mm.90 Event GlobalGC collect complete
00:28:33.722693688 0x0 j9mm.475 Event GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:33.723049303 0x0 j9mm.474 Event GlobalGC start: globalcount=480
00:28:33.801943391 0x0 j9mm.90 Event GlobalGC collect complete
00:28:33.801966460 0x0 j9mm.475 Event GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.116965255 0x0 j9mm.474 Event GlobalGC start: globalcount=482
00:28:35.802519107 0x0 j9mm.90 Event GlobalGC collect complete
00:28:35.802544021 0x0 j9mm.475 Event GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.803401100 0x0 j9mm.474 Event GlobalGC start: globalcount=483
00:28:35.886767952 0x0 j9mm.90 Event GlobalGC collect complete
00:28:35.886797807 0x0 j9mm.475 Event GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:35.986958403 0x0 j9mm.474 Event GlobalGC start: globalcount=485
00:28:36.068119797 0x0 j9mm.90 Event GlobalGC collect complete
00:28:36.068143276 0x0 j9mm.475 Event GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
00:28:36.073785786 0x0 j9mm.474 Event GlobalGC start: globalcount=486
00:28:36.167755330 *0x3FF88002400 j9mm.90 Event GlobalGC collect complete
00:28:36.167755385 *0x0 j9mm.90 Event GlobalGC collect complete
00:28:36.167790932 0x3FF88002400 j9mm.475 Event GlobalGC end: workstackoverflow=0 overflowcount=0 memory=0/8388608
But then, for Balanced GC, per design, we have to free up a whole region, to be able to proceed with allocation.
This (also from traces) indicates we might have some free memory
Tenure: 7707034 bytes, Free Tenure 0 bytes ,
(gdb) p 8388608 - 7707034
$1 = 681574
This is enough for one region (524288), but perhaps somewhat scattered/fragmented so actually not enough to free one. Parallel Compact is indeed non-perfect, and this can happened to some degree (can't tell exactly what the imperfection factor is).
This is another indication of large live set size (this might exclude leafs/primitive arrays, hence a bit smaller than the real one):
4 threads scanned 7506528 bytes.
Long story short, it could be that just the heap of 8M is occasionally insufficient.
Explicitely specified (both from console logs and from traces): -Xmx8m
https://openj9-jenkins.osuosl.org/job/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/60 - ub20-390-6
cmdLineTester_GCRegressionTests_1
23:54:03 Testing: Ensure no core files have been produced by the preceding tests
23:54:03 Test start time: 2024/09/03 03:53:56 Coordinated Universal Time
23:54:03 Running command: sh -c ls
23:54:03 Time spent starting: 35 milliseconds
23:54:03 Time spent executing: 54 milliseconds
23:54:03 Test result: FAILED
23:54:03 Output from test:
23:54:03 [OUT] core.20240903.035256.4069621.0001.dmp
23:54:03 [OUT] core.20240903.035256.4069621.0002.dmp
23:54:03 [OUT] foo.%001.log
23:54:03 [OUT] foo.001.log
23:54:03 [OUT] foo.002.log
23:54:03 [OUT] foo.#.log
23:54:03 [OUT] foo.log
23:54:03 [OUT] foo.log.001
23:54:03 [OUT] foo.log.002
23:54:03 [OUT] foo.log.003
23:54:03 [OUT] heapdump.20240903.035256.4069621.0003.phd
23:54:03 [OUT] javacore.20240903.035256.4069621.0004.txt
23:54:03 [OUT] Snap.20240903.035256.4069621.0005.trc
23:54:03 [OUT] static_temp.txt
23:54:03 [OUT] verbosegc.20240903.034815.4069165.txt
23:54:03 [OUT] verbosegc.20240903.034914.4069415.txt
I am confused with this one. Cores are thrown at OOM event:
1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java heap space" received
by test with this command line:
1CICMDLINE /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders -cp /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
The console output of this test:
23:52:47 Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
23:52:47 Test start time: 2024/09/03 03:52:42 Coordinated Universal Time
23:52:47 Running command: "/home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/jdkbinary/j2sdk-image/bin/java" -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders -cp /home/jenkins/workspace/Test_openjdk23_j9_sanity.functional_s390x_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
23:52:47 Time spent starting: 16 milliseconds
23:54:03 Time spent executing: 74461 milliseconds
23:54:03 Test result: PASSED
Test filters out filter=java/lang/OutOfMemoryError
but files are generated regardless.
The thread produced OOM is "main":
03:52:56.379613292 0x3FF9816C200 j9mm.100 Event J9AllocateObject() returning NULL! 64 bytes requested for object of class 0x3ff98241400 from memory space 'Flat' id=0x3ff980ecd00
!stack 0x3ff9816c200 !j9vmthread 0x3ff9816c200 !j9thread 0x3ff9801e160 tid 0x3e18f6 (4069622) // (main)
@pshipton How do you think it has failed?
I think the filter means you only get java/lang/OutOfMemoryError. The java callstack
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/lang/ClassLoader.defineClassImpl(Native Method)
4XESTACKTRACE at java/lang/ClassLoader.defineClassInternal(ClassLoader.java:470(Compiled Code))
4XESTACKTRACE at java/lang/ClassLoader.defineClass(ClassLoader.java:431(Compiled Code))
4XESTACKTRACE at java/lang/ClassLoader.defineClass(ClassLoader.java:383(Compiled Code))
4XESTACKTRACE at com/ibm/tests/garbagecollector/TestClassLoader.newInstance(TestClassLoader.java:37(Compiled Code))
4XESTACKTRACE at com/ibm/tests/garbagecollector/TestClassLoaderMain.classLoaderAllocate(TestClassLoaderMain.java:89(Compiled Code))
4XESTACKTRACE at com/ibm/tests/garbagecollector/TestClassLoaderMain.test(TestClassLoaderMain.java:74(Compiled Code))
4XESTACKTRACE at com/ibm/tests/garbagecollector/TestClassLoaderMain.main(TestClassLoaderMain.java:139)
The success criteria for the sub-test is Successful test run!
or Cannot load library required by: -Xjit
so presumably one of these was already printed out. I guess it must be the second one because I don't see how the first could be printed yet.
There is a way to always show the output from the subtest by running the cmdlinetester with -verbose
, if we want to modify the test to do that. Seems this is a rare occurrence, not sure it's worth it, but it could help clear up the mystery.
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/372/ - ub20-390-2 cmdLineTester_GCRegressionTests_1
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_1/372/functional_test_output.tar.gz