Closed pshipton closed 5 years ago
@0dvictor do we have a bad sign extension or something happening?
I'm still working on it. I could not reproduce it locally, and hence am trying grinders now.
https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-linux_x86-64_cmprssptrs/604
FAILED: testArrayGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testArrayGetAndOpLong(TestUnsafeGetAndOp.java:141)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:580)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:716)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:988)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
at org.testng.TestRunner.privateRun(TestRunner.java:648)
at org.testng.TestRunner.run(TestRunner.java:505)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
at org.testng.SuiteRunner.run(SuiteRunner.java:364)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
at org.testng.TestNG.runSuites(TestNG.java:1049)
at org.testng.TestNG.run(TestNG.java:1017)
at org.testng.TestNG.privateMain(TestNG.java:1354)
at org.testng.TestNG.main(TestNG.java:1323)
FAILED: testInstanceGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testInstanceGetAndOpLong(TestUnsafeGetAndOp.java:86)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:580)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:716)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:988)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
at org.testng.TestRunner.privateRun(TestRunner.java:648)
at org.testng.TestRunner.run(TestRunner.java:505)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
at org.testng.SuiteRunner.run(SuiteRunner.java:364)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
at org.testng.TestNG.runSuites(TestNG.java:1049)
at org.testng.TestNG.run(TestNG.java:1017)
at org.testng.TestNG.privateMain(TestNG.java:1354)
at org.testng.TestNG.main(TestNG.java:1323)
FAILED: testStaticGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testStaticGetAndOpLong(TestUnsafeGetAndOp.java:196)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:580)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:716)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:988)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
at org.testng.TestRunner.privateRun(TestRunner.java:648)
at org.testng.TestRunner.run(TestRunner.java:505)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
at org.testng.SuiteRunner.run(SuiteRunner.java:364)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
at org.testng.TestNG.runSuites(TestNG.java:1049)
at org.testng.TestNG.run(TestNG.java:1017)
at org.testng.TestNG.privateMain(TestNG.java:1354)
at org.testng.TestNG.main(TestNG.java:1323)
https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-linux_x86-64_cmprssptrs/608
FAILED: testStaticGetAndOpLong (Regular)
java.lang.AssertionError: expected:<-1> but was:<-72057593937236925>
https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-linux_x86-64/575 https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-linux_x86-64/576
FAILED: testArrayGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testArrayGetAndOpLong(TestUnsafeGetAndOp.java:141)
FAILED: testInstanceGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testInstanceGetAndOpLong(TestUnsafeGetAndOp.java:86)
FAILED: testStaticGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testStaticGetAndOpLong(TestUnsafeGetAndOp.java:196)
https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK12-linux_x86-64/26
FAILED: testArrayGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testArrayGetAndOpLong(TestUnsafeGetAndOp.java:141)
FAILED: testInstanceGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testInstanceGetAndOpLong(TestUnsafeGetAndOp.java:86)
FAILED: testStaticGetAndOpLong (Regular)
java.lang.NullPointerException
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testStaticGetAndOpLong(TestUnsafeGetAndOp.java:196)
@0dvictor do you have any update on this one - it is targeted to 0.14 and will need a fix soon if it is to make the release.
Having spoken to @0dvictor the issue was trying to get some kind of diagnostics around the point of the failure. He tried running the test repeatedly in a grinder but was not able to provoke the problem and logging and tracing of seemingly related methods in the passing runs showed no obvious problem. The issue with these tests is that when the JUnit harness sees the exception it is caught and squashed with just the notation that it has failed. So I think we need some help adapting the test so that when this kind of problem happens we actually either a) crash the VM or b) capture a system dump at the time of the exception. It would be good to run -Xrs and have an -Xdump system on uncaught exception or something like that. @smlambert any suggestions on how we might get more info from these failing runs since it wasn't seen with a grinder and seems infrequent.
You could add a variation in the playlist to add the -Xrs / -Xdump commandline options you mentioned to run the test regularly with those options https://github.com/eclipse/openj9/blob/master/test/functional/UnsafeTest/playlist.xml#L54
And/or in a Grinder run, append those commandline options using EXTRA_OPTIONS parameter?
@smlambert is there a way to stop JUnit from catching that exception?
Hmm, in testNG I believe one could write a custom listener that would intercept the exception, to print stacktrace before framework swallows it. Its a bit heavy-handed, and not sure how well it works so will have to investigate.
For this case, has someone run it once locally, and then just taken the command line that the framework runs and run it directly (in a loop inside a shell script for repeated runs) in the local environment? That would likely be easiest way to reproduce this.
We print out the command that is used near the top of the test output:
06:47:29 =============================================== 06:47:29 Running test UnsafeTests_0 ... 06:47:29 =============================================== 06:47:29 UnsafeTests_0 Start Time: Sun Mar 17 07:47:28 2019 Epoch Time (ms): 1552819648683 06:47:29 ... 06:47:29 "/home/jenkins/workspace/Test-sanity.functional-JDK12-linux_x86-64/openjdkbinary/j2sdk-image/bin/java" -Xnocompressedrefs -DScenario=Regular --add-opens java.base/jdk.internal.org.objectweb.asm=ALL-UNNAMED --add-opens java.base/jdk.internal.misc=ALL-UNNAMED -cp "/home/jenkins/workspace/Test-sanity.functional-JDK12-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/resources:/home/jenkins/workspace/Test-sanity.functional-JDK12-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/lib/testng.jar:/home/jenkins/workspace/Test-sanity.functional-JDK12-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/TestConfig/lib/jcommander.jar:/home/jenkins/workspace/Test-sanity.functional-JDK12-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/functional/UnsafeTest/OpenJ9Unsafe.jar:." org.testng.TestNG -d "/home/jenkins/workspace/Test-sanity.functional-JDK12-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15528126759026/UnsafeTests_0" "/home/jenkins/workspace/Test-sanity.functional-JDK12-linux_x86-64/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/functional/UnsafeTest/testng.xml" -testnames UnsafeTests -groups level.sanity -excludegroups d..linux_x86-64,d..arch.x86,d..os.linux,d..bits.64,d.*.generic-all;
Alternatively in a branch someone could try catch the particular exception in the test case itself and print more info at the point of the exception, then run Grinders from the branch where the temporary try/catch code was added.
@0dvictor Any luck reproducing this failure outside of the nightly builds?
Sorry that I haven't got it reproduced outside of the nightly builds yet. I'll try more/different grinders.
https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-linux_x86-64/636/
FAILED: testArrayGetAndOpLong (Regular)
java.lang.AssertionError: expected:<-2> but was:<0>
at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:88)
at org.openj9.test.unsafe.UnsafeTestBase.checkSameAt(UnsafeTestBase.java:547)
at org.openj9.test.unsafe.UnsafeTestBase.testGetLong(UnsafeTestBase.java:1686)
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testArrayGetAndOpLong(TestUnsafeGetAndOp.java:141)
FAILED: testInstanceGetAndOpLong (Regular)
java.lang.AssertionError: expected:<-2> but was:<0>
at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:88)
at org.openj9.test.unsafe.UnsafeTestBase.checkSameAt(UnsafeTestBase.java:547)
at org.openj9.test.unsafe.UnsafeTestBase.testGetLong(UnsafeTestBase.java:1686)
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testInstanceGetAndOpLong(TestUnsafeGetAndOp.java:86)
FAILED: testStaticGetAndOpLong (Regular)
java.lang.AssertionError: expected:<-1> but was:<432346112300683565>
at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:88)
at org.openj9.test.unsafe.UnsafeTestBase.checkSameAt(UnsafeTestBase.java:547)
at org.openj9.test.unsafe.UnsafeTestBase.testGetLong(UnsafeTestBase.java:1669)
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testStaticGetAndOpLong(TestUnsafeGetAndOp.java:196)
Trying 200 iterations on the external machines https://ci.eclipse.org/openj9/view/Test/job/Test-Grinder/368/
Failed 7/200
@0dvictor @andrewcraik any chance of getting this resolved before we finalize the 0.14 release, scheduled for tomorrow (April 16th). Otherwise we'll have to move it out.
Moving to 0.15
https://ci.eclipse.org/openj9/job/Test_openjdk12_j9_sanity.functional_x86-64_linux_xl_Nightly/11
FAILED: testArrayGetAndOpLong (Regular)
java.lang.AssertionError: expected:<-1> but was:<140231421543432>
at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
at org.testng.AssertJUnit.failNotEquals(AssertJUnit.java:364)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:80)
at org.testng.AssertJUnit.assertEquals(AssertJUnit.java:88)
at org.openj9.test.unsafe.UnsafeTestBase.checkSameAt(UnsafeTestBase.java:547)
at org.openj9.test.unsafe.UnsafeTestBase.testGetLong(UnsafeTestBase.java:1669)
at org.openj9.test.unsafe.TestUnsafeGetAndOp.testArrayGetAndOpLong(TestUnsafeGetAndOp.java:141)
...
This was reported in multiple PR builds in #5626. It would be good to get some attention on resolving this.
fyi @0dvictor @andrewcraik @0xdaryl
@kobinau can you try to reproduce this failure locally and find out the failing method?
Ran machine to generate an intermittancy of 1/50. Testing on optLevel=noOpt to determine if issue is optimization bug or not.
Given how close the 0.15.0 release date is and we don't have a fix in hand, moving this to 0.16.0
Looks like our culprit is: org/openj9/test/unsafe/UnsafeTestBase.testGetLong(Ljava/lang/Object;Ljava/lang/String;) excluding the method from optimization, results in no intermittant error from what I've found. Narrowing down optimization level culprit.
@kobinau Were you able to track this further after you identified the problematic method?
Kobi and I having been working on this and it seems that register allocation spilled the register holding the index at a wrong instruction. The test failure is because the expected value is read with a wrong array index. Still digging through the code around register allocation.
Reference count was messed up, and nodes were being referenced across basic blocks instead of limited to referencing within blocks.
Testing a fix now.
Spotted in OMR Acceptance testing, but I don't think it is related to the OMR changes being tested. https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-linux_x86-64_cmprssptrs/505