eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

DaaLoadTest CS ArrayStoreException #12214

Open pshipton opened 3 years ago

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_extended.system_ppc64_aix_Release_testList_2/1/ DaaLoadTest_all_CS_5m_0

DLT 11:51:08.872 - First failure detected by thread: load-10. Not creating dumps as no dump generation is requested for this load test
DLT 11:51:08.885 - suite.getInventory().getInventoryFileRef(): openj9.test.load/config/inventories/daa/daaAll.xml
DLT 11:51:08.886 - suite.isCreateDump() : false
DLT 11:51:08.888 - Test failed
DLT   Failure num.  = 1
DLT   Test number   = 35
DLT   Test details  = 'JUnit[net.openj9.test.binaryData.TestInteger2ByteArrayNumBytes]'
DLT   Suite number  = 0
DLT   Thread number = 10
DLT >>> Captured test output >>>
DLT testStarted : initializationError(net.openj9.test.binaryData.TestInteger2ByteArrayNumBytes)
DLT testFailure: initializationError(net.openj9.test.binaryData.TestInteger2ByteArrayNumBytes): null
DLT java.lang.ArrayStoreException
DLT     at java.util.AbstractCollection.toArray(AbstractCollection.java:196)
DLT     at sun.reflect.annotation.AnnotationParser.toArray(AnnotationParser.java:865)
DLT     at java.lang.reflect.Executable.getDeclaredAnnotations(Executable.java:588)
DLT     at java.lang.reflect.Method.getDeclaredAnnotations(Method.java:630)
DLT     at java.lang.reflect.AccessibleObject.getAnnotations(AccessibleObject.java:207)
DLT     at org.junit.runners.model.FrameworkMethod.getAnnotations(FrameworkMethod.java:187)
DLT     at org.junit.runners.model.TestClass.addToAnnotationLists(TestClass.java:84)
DLT     at org.junit.runners.model.TestClass.scanAnnotatedMembers(TestClass.java:66)
DLT     at org.junit.runners.model.TestClass.<init>(TestClass.java:57)
DLT     at org.junit.runners.ParentRunner.createTestClass(ParentRunner.java:88)
DLT     at org.junit.runners.ParentRunner.<init>(ParentRunner.java:83)
DLT     at org.junit.runners.BlockJUnit4ClassRunner.<init>(BlockJUnit4ClassRunner.java:65)
DLT     at org.junit.internal.builders.JUnit4Builder.runnerForClass(JUnit4Builder.java:10)
DLT     at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
DLT     at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
DLT     at org.junit.runner.Computer.getRunner(Computer.java:40)
DLT     at org.junit.runner.Computer$1.runnerForClass(Computer.java:31)
DLT     at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
DLT     at org.junit.runners.model.RunnerBuilder.runners(RunnerBuilder.java:101)
DLT     at org.junit.runners.model.RunnerBuilder.runners(RunnerBuilder.java:87)
DLT     at org.junit.runners.Suite.<init>(Suite.java:81)
DLT     at org.junit.runner.Computer.getSuite(Computer.java:28)
DLT     at org.junit.runner.Request.classes(Request.java:75)
DLT     at org.junit.runner.Request.classes(Request.java:91)
DLT     at net.adoptopenjdk.loadTest.adaptors.JUnitAdaptor.executeTest(JUnitAdaptor.java:129)
DLT     at net.adoptopenjdk.loadTest.LoadTestRunner$2.run(LoadTestRunner.java:182)
DLT     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
DLT     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
DLT     at java.lang.Thread.run(Thread.java:823)
DLT testFinished: initializationError(net.openj9.test.binaryData.TestInteger2ByteArrayNumBytes)
DLT 
DLT JUnit Test Results for: net.openj9.test.binaryData.TestInteger2ByteArrayNumBytes
DLT   Ran    : 1
DLT   Passed : 0
DLT   Failed : 1
DLT   Ignored: 0
DLT   Result : FAILED
DLT <<<
pshipton commented 3 years ago

@amicic @bragaigor not sure if this is a GC problem? I'll run some GCs later to see if it can be recreated. Note the failure is in the v0.26.0-release branch.

bragaigor commented 3 years ago

I'm trying to reproduce it as well

pshipton commented 3 years ago

5 x 5x grinders https://ci.eclipse.org/openj9/view/Test/job/Grinder/1585 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1586 - passed https://ci.eclipse.org/openj9/view/Test/job/Grinder/1587 - passed https://ci.eclipse.org/openj9/view/Test/job/Grinder/1588 - passed https://ci.eclipse.org/openj9/view/Test/job/Grinder/1589 - passed

Happened to spot the following in the first one

13:15:56  DLT stderr 17:15:56.520 0x30464a00    j9mm.107    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK8_ppc64_aix_Release/omr/gc/base/MemoryPoolSplitAddressOrderedListBase.cpp:499: ((false && (((0) == nextFreeEntry) || (addrTop <= nextFreeEntry))))
bragaigor commented 3 years ago

Was the failure observed in other platforms or just in AIX PPC64 with JDK 8?

pshipton commented 3 years ago

I only saw it on AIX, but if it's intermittent it may occur on other platforms but just hasn't shown up yet. I can grind another platform, do you have a preference?

bragaigor commented 3 years ago

That makes sense, and I would say either x86 linux 64 or PPC linux 64. (I'm also trying to reproduce in these platforms locally)

pshipton commented 3 years ago

5 x 10x on xlinux https://ci.eclipse.org/openj9/view/Test/job/Grinder/1590 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1591 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1592 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1593 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1594

5 x 10x on plinux https://ci.eclipse.org/openj9/view/Test/job/Grinder/1595 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1596 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1597 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1598 https://ci.eclipse.org/openj9/view/Test/job/Grinder/1599

These all passed.

amicic commented 3 years ago

Early to give a proper diagnosis, but so far, this looks way more likely a JIT related problem (corruption in heap, for example overextending object access etc) than a GC problem.

amicic commented 3 years ago

FYI @dmitripivkine

bragaigor commented 3 years ago

From the tests above it looks like this failure is only happening in AIX PPC64 with JDK 8 after all

dmitripivkine commented 3 years ago

I can not find any system core stored, even from grinder https://ci.eclipse.org/openj9/view/Test/job/Grinder/1585 with assertion in GC. Running grinders seems useless until we update them to generate cores. What is the right way to get core stored?

pshipton commented 3 years ago

From the tests above it looks like this failure is only happening in AIX PPC64 with JDK 8 after all

Hard to tell given the low reproduction rate. The original problem wasn't reproduced even on AIX.

pshipton commented 3 years ago

@Mesbah-Alam the grinder https://ci.eclipse.org/openj9/view/Test/job/Grinder/1585 asserted so it should have created a core file, but there are no messages about creating diagnostic files, and no diagnostics in the artifacts. Are we running this test with -Xdump:none or similar options that disable diagnostics? I do see -dumpRequested false, not sure if that is related. @llxia fyi

1.DLT.stderr contains

17:15:56.520 0x30464a00    j9mm.107    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK8_ppc64_aix_Release/omr/gc/base/MemoryPoolSplitAddressOrderedListBase.cpp:499: ((false && (((0) == nextFreeEntry) || (addrTop <= nextFreeEntry))))
JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/03/15 13:15:56 - please wait.
JVMDUMP039I Processing dump event "user", detail "" at 2021/03/15 13:50:55 - please wait.
bragaigor commented 3 years ago

Hello, I’ve been trying to reproduce the failure in Grinder and locally for AIX PPC64 JDK 8. So I let the test run for all day yesterday and out of 130 iterations the only suspect that I got was one test run that output the following:

DLT 17:16:51.084 - Starting thread. Suite=0 thread=5
gmake[4]: *** [autoGen.mk:30: DaaLoadTest_all_CS_5m_0] Hangup
gmake[3]: *** [/home/jenkins/temp/openn/openjdk-tests/TKG/../TKG/settings.mk:250: DaaLoadTest_all_CS_5m_0-daaLoadTest] Hangup
gmake[2]: *** [/home/jenkins/temp/openn/openjdk-tests/TKG/../TKG/settings.mk:250: DaaLoadTest_all_CS_5m_0-system] Hangup
gmake[1]: *** [settings.mk:250: DaaLoadTest_all_CS_5m_0-..] Hangup
gmake: *** [makefile:48: _DaaLoadTest_all_CS_5m_0] Hangup

But I’m not sure if that’s related to the issue at hand (this looks more like a machine problem?).

pshipton commented 3 years ago

But I’m not sure if that’s related to the issue at hand (this looks more like a machine problem?).

Agreed.

Mesbah-Alam commented 3 years ago

I do see -dumpRequested false, not sure if that is related.

Hi @pshipton , this is not related. When there is an assertion or a crash, dumps should be created normally; -dumpRequested flag will not prevent that.

(The -dumpRequested flag is only used to control the STF feature that creates a set of heap, java and system dumps during a test failure. This was causing unnecessary dumps being created so we added the -dumpRequested flag.)

Mesbah-Alam commented 3 years ago
STF 13:52:55.053 - Sending SIGXCPU (kill -24) to the java process to generate an OS dump
STF 13:52:55.054 - Running command: kill -24 18940050
STF 13:52:55.054 - Redirecting stderr to /home/jenkins/workspace/Grinder/openjdk-tests/TKG/output_16158285507977/DaaLoadTest_all_CS_5m_0/20210315-131551-DaaLoadTest/results/1.DLT.kill_24.stderr
STF 13:52:55.054 - Redirecting stdout to /home/jenkins/workspace/Grinder/openjdk-tests/TKG/output_16158285507977/DaaLoadTest_all_CS_5m_0/20210315-131551-DaaLoadTest/results/1.DLT.kill_24.stdout
DLT stderr JVMDUMP039I Processing dump event "user", detail "" at 2021/03/15 13:50:55 - please wait.
STF 13:52:55.063 - Monitoring Report Summary:
STF 13:52:55.063 -   o Process DLT has timed out
STF 13:52:55.064 - Killing processes: DLT
STF 13:52:55.064 -   o Process DLT pid 18940050 stop()
STF 13:52:56.065 -   o Process DLT pid 18940050 killed
**FAILED** at step 1 (Run daa load test). Expected return value=0 Actual=1 at /home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/output_16158285507977/DaaLoadTest_all_CS_5m_0/20210315-131551-DaaLoadTest/execute.pl line 94.

Looking at the failure output, it may be a case of STF killing the sub-process due to timeout, causing the dumps being not generated.

STF 13:52:55.063 - o Process DLT has timed out

The test sub-process named DLT gets launched by STF with a timeout of 35m. When assertion happens, an attempt is made to generate the dump by issuing SIGXCPU, but it appears that the DLT sub-process reaches its timeout while the dump generation is taking place. So, STF just kills the whole thing - as it's designed to do.

pshipton commented 3 years ago

The assert occurred at 13:15, well before the timeout occurred.

DLT 13:15:56.065 - Starting thread. Suite=0 thread=13
DLT stderr 17:15:56.520 0x30464a00    j9mm.107    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK8_ppc64_aix_Release/omr/gc/base/MemoryPoolSplitAddressOrderedListBase.cpp:499: ((false && (((0) == nextFreeEntry) || (addrTop <= nextFreeEntry))))
DLT stderr JVMDUMP039I Processing dump event "traceassert", detail "" at 2021/03/15 13:15:56 - please wait.
STF 13:20:54.142 - Heartbeat: Process DLT is still running

I assume there is a sub-process being run? It would be useful to (always) print the command lines for any sub-processes.

Mesbah-Alam commented 3 years ago

The assert occurred at 13:15, well before the timeout occurred.

True, but the dump collection started at 13:50:55.015 and the timeout occurs at 13:52:55.063 - in between these two events, we see a whole bunch of output suggesting the framework is trying to generate dump. That's why I was guessing if the timeout killed the dump generation.

I assume there is a sub-process being run?

By subprocess I meant the sub-process of the test, e.g. DLT - whose command line does get printed by STF:

STF 13:15:54.612 - Running command: /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -Xgcpolicy:gencon -Xgc:concurrentScavenge -classpath /home/jenkins/workspace/Grinder/jvmtest/system/stf/stf.load/bin:/home/jenkins/workspace/Grinder/jvmtest/system/stf/stf.core/bin:/home/jenkins/workspace/Grinder/jvmtest/system/systemtest_prereqs/log4j/log4j-api.jar:/home/jenkins/workspace/Grinder/jvmtest/system/systemtest_prereqs/log4j/log4j-core.jar:/home/jenkins/workspace/Grinder/jvmtest/system/systemtest_prereqs/junit/junit.jar:/home/jenkins/workspace/Grinder/jvmtest/system/systemtest_prereqs/junit/hamcrest-core.jar:/home/jenkins/workspace/Grinder/jvmtest/system/openj9-systemtest/openj9.test.daa/bin net.adoptopenjdk.loadTest.LoadTest -resultsDir /home/jenkins/workspace/Grinder/openjdk-tests/TKG/output_16158285507977/DaaLoadTest_all_CS_5m_0/20210315-131551-DaaLoadTest/results -resultsPrefix 1.DLT. -timeLimit 5m -dumpRequested false -reportFailureLimit 1 -abortAtFailureLimit 10 -maxTotalLogFileSpace 200M -maxSingleLogSize 1/25 -suite.daa.threadCount 14 -suite.daa.inventoryFile /home/jenkins/workspace/Grinder/openjdk-tests/TKG/output_16158285507977/DaaLoadTest_all_CS_5m_0/20210315-131551-DaaLoadTest/results/1.DLT.inventory/openj9.test.load/config/inventories/daa/daaAll.xml -suite.daa.inventoryExcludeFile none -suite.daa.selection random -suite.daa.seed -1 -suite.daa.repeatCount 1 -suite.daa.thinkingTime 0ms..0ms