Closed pshipton closed 3 years ago
The test, which failed converting some UTF8 bytes to a String.
public class ConsCharset
implements Testlet
{
public void test (TestHarness h)
{
try
{
byte[] cp437Bytes = asByteArray(new int[] { 224, 226, 227, 228, 156 });
checkString(h, new String(cp437Bytes, Charset.forName("CP437")),
"\u03b1\u0393\u03c0\u03a3\u00a3");
}
catch (UnsupportedCharsetException e)
{
// Skip tests as CP437 is not required by the spec.
}
byte[] utf8Bytes = asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
checkString(h, new String(utf8Bytes, Charset.forName("UTF8")),
"\u00DF\u2153\u20AF\uD800\uDD40");
byte[] isoBytes = asByteArray(new int[] {0x48,0x65,0x6C,0x6C,0x6F,0x20,0x57,0x6F,0x72,0x6C,0x64,0x21});
checkString(h, new String(isoBytes, Charset.forName("ISO-8859-1")),
"Hello World!");
}
private void checkString(TestHarness h, String result, String expected)
{
for (int a = 0; a < result.length(); ++a)
{
h.check(result.charAt(a), expected.charAt(a));
}
h.check(result, expected);
}
private byte[] asByteArray(int[] ints)
{
byte[] bytes = new byte[ints.length];
for (int a = 0; a < ints.length; ++a)
bytes[a] = Integer.valueOf(ints[a]).byteValue();
return bytes;
}
}
Run some grinders https://ci.eclipse.org/openj9/job/Grinder/1174/ Failed 2 / 24
@gita-omr can someone take a look pls.
@gita-omr this is failing enough to be a blocker, but it's running in a non-production mode. We need to evaluate if the problem is specific to the non-production options.
Yes, we will take a look. @mnalam-p
I tried running in grinder and test locally, I am not able to reproduce the issue with same jdk build (openj9/omr/jcl hash). @pshipton @gita-omr Any suggestion on what else should I try?
Trying another grinder on the OpenJ9 machines, using the latest build. https://ci.eclipse.org/openj9/job/Grinder/1205/ - failed 6/48
@mnalam-p the grinder isn't done, but there are still failures https://ci.eclipse.org/openj9/job/Grinder_iteration_2/20/
btw, I suggested to Nazmul he try grinding using NodesByIterations
and NUM_MACHINES to get coverage across a number of machines in order to recreate it.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1843
As this is being investigated and is known to fail frequently in grinders, I'll stop adding new comments for each failure in the nighty builds.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly/1063 https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1848
Started 5 iterations here: https://ci.eclipse.org/openj9/job/Grinder/1226/
Had to replace IP address in the CUSTOMIZED_SDK_URL: https://ci.eclipse.org/openj9/job/Grinder/1227/
Had to replace IP address in the CUSTOMIZED_SDK_URL: https://ci.eclipse.org/openj9/job/Grinder/1227/
@jdekonin still did not manage to make it run. Could you please help?
@pshipton NodesByIteration was not able to reproduce the issue. Two config tested -
ITERATIONS=5
and NUM_MACHINES=4
ITERATIONS=20
and NUM_MACHINES=10
Any other suggestions?
One question, what is the difference between production / non-production build?what is the difference between production / non-production build?
Where do you see that, I might want to change my answer. Something we build nightly or in a personal build is non-production. A build created for a release, with the correct properties, version numbers, and perhaps even building on special machines with restricted access is a production build. There would be no difference in functionality if built from the same SHAs, it's more about the -version
output.
Nightly build: openjdk version "1.8.0_282-internal" OpenJDK Runtime Environment (build 1.8.0_282-internal-jenkins_2020_11_26_18_17-b00) Eclipse OpenJ9 VM (build master-2f4ea25a94c, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20201126_579 (JIT enabled, AOT enabled)
Release build: openjdk version "1.8.0_272" OpenJDK Runtime Environment (build 1.8.0_272-b10) Eclipse OpenJ9 VM (build openj9-0.23.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20201022_822 (JIT enabled, AOT enabled)
Any other suggestions?
Request access to an OpenJ9 machine where the problem occurs.
Had to replace IP address in the CUSTOMIZED_SDK_URL...
@gita-omr CUSTOMIZED_SDK_URL has a few links that need modifying you only got the first which was the SDK. The second was the test-images.tar.gz. I relaunched with the correction https://ci.eclipse.org/openj9/job/Grinder/1230/console
Side note: @rajdeepsingh1 increased the artifactory space, so anything after November 19th won't matter it should be in both locations. I realize that doesn't help this issue though.
@pshipton I ran 20 iteration of test.MauveMultiThreadLoadTest on an OpenJ9 machine (twice). All test passed and did not see any test failure.
I can grind it again if you like. There were more failures last night. https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1848
Did you choose a machine where we've seen failures before?
5 iterations passed: https://ci.eclipse.org/openj9/job/Grinder/1230/console
I think the above only ran MauveMultiThreadLoadTest_0
Started 1 iteration of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1235/ Passed.
Started 25 iterations of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1236/
7 out of 100 failed (each iteration has 4 runs).
Just wanted to mention that @mnalam-p was able to reproduce on openj9 machine manually and that the failure rate with count=0 seems to be pretty high. Narrowing down the method now.
I have reduced the test cases to only the failing one (ConsCharset) and thus now can cycle test faster. Here is what I have found so far -
Any suggestion would be helpful on what to do next.
With the AutoSIMD fix, still I was able to generate the failure locally. It is still kind of hit and miss to generate the failure. One observation I can see from the actual vs expected output failure is that the first three values are off by one byte.
For 0xC3,0x9F,0xE2
vs 00DF
we get 65503 (FFDF) and our expected was 0x00DF
For 0x85,0x93,0xE2
vs 2153
we get 57683 (E153) and our expected was 0x2153
Based on this observation I tried to look into asByteArray
method.
Also, I modified the suite to enable dump of failure and got a javacore and heapdump as well. I will try to see if I can get some clue from them or not. I am still trying to pinpoint the actual failure to consistently reproduce it.
FYI a hang in the same test https://github.com/eclipse/openj9/issues/11400
After reducing the test case, I still see the test failure. I am suspecting the problem is in new String(byte [], Charset)
. Following is the test case which was able to generate the test failure -
import java.nio.charset.UnsupportedCharsetException;
public class ConsCharset
implements Testlet
{
public void test (TestHarness h)
{
try
{
byte[] cp437Bytes = new byte[] { (byte)224, (byte)226, (byte)227, (byte)228, (byte)156 }; //asByteArray(new int[] { 224, 226, 227, 228, 156 });
checkString(h, new String(cp437Bytes, Charset.forName("CP437")),
"\u03b1\u0393\u03c0\u03a3\u00a3");
}
catch (UnsupportedCharsetException e)
{
// Skip tests as CP437 is not required by the spec.
}
byte[] utf8Bytes = new byte[] { (byte)0xC3,(byte)0x9F,(byte)0xE2,(byte)0x85,(byte)0x93,(byte)0xE2,(byte)0x82,(byte)0xAF,(byte)0xF0,(byte)0x90,(byte)0x85,(byte)0x80 }; //asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
checkString(h, new String(utf8Bytes, Charset.forName("UTF8")),
"\u00DF\u2153\u20AF\uD800\uDD40");
byte[] isoBytes = new byte[] {(byte)0x48,(byte)0x65,(byte)0x6C,(byte)0x6C,(byte)0x6F,(byte)0x20,(byte)0x57,(byte)0x6F,(byte)0x72,(byte)0x6C,(byte)0x64,(byte)0x21}; //asByteArray(new int[] {0x48,0x65,0x6C,0x6C,0x6F,0x20,0x57,0x6F,0x72,0x6C,0x64,0x21});
checkString(h, new String(isoBytes, Charset.forName("ISO-8859-1")),
"Hello World!");
}
private void checkString(TestHarness h, String result, String expected)
{
for (int a = 0; a < result.length(); ++a)
{
h.check(result.charAt(a), expected.charAt(a));
}
h.check(result, expected);
}
}
Started here: https://ci.eclipse.org/openj9/job/Grinder/1274/
This time with -Xjit=limit={*} since we suspect that even that option can change the behaviour.
I was able to reproduce this issue with the oldest available nightly build from adoptopenjdk (17 April 2020).
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-b09)
Eclipse OpenJ9 VM (build openj9-0.20.0, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20200416_571 (JIT enabled, AOT enabled)
OpenJ9 - 05fa2d361
OMR - d4365f371
JCL - cfa47e66cd5f based on jdk8u252-b09)
LT stderr
LT 13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT 13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT 13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT 13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT 13:07:20.528 - suite.isCreateDump() : true
LT 13:07:20.528 - suite.isCreateDump() : true
LT 13:07:20.528 - suite.isCreateDump() : true
LT 13:07:20.528 - suite.isCreateDump() : true
LT 13:07:20.528 - Test failed. Details recorded in execution log.
LT 13:07:20.528 - Test failed. Details recorded in execution log.
LT 13:07:20.528 - Test failed. Details recorded in execution log.
LT 13:07:20.529 - Test failed
LT Failure num. = 1
LT Test number = 0
LT Test details = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT Suite number = 0
LT Thread number = 34
LT >>> Captured test output >>>
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT got 65503 but expected 223
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT got 57683 but expected 8531
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT got 57519 but expected 8367
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT got 65533 but expected 55296
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT got 65533 but expected 56640
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 11)
LT got 㿟⅓₯㿽㿽 but expected ��
While it would be nice to figure this out, obviously it doesn't need to be resolved for the 0.24 release, moving it forward.
We've discovered a way to reliably reproduce this failure nearly 100% of the time.
Use the following simplified test:
import java.nio.charset.UnsupportedCharsetException;
import java.nio.charset.Charset;
public class ConsCharset
{
public static void main(String[] args)
{
System.out.println("Starting Test");
for (int i = 0; i < 100000; i++)
{
if (!runTest())
{
System.out.println("Test Failed");
return;
}
}
System.out.println("Test Passed");
}
public static boolean runTest()
{
try
{
byte[] cp437Bytes = new byte[] { (byte)224, (byte)226, (byte)227, (byte)228, (byte)156 }; //asByteArray(new int[] { 224, 226, 227, 228, 156 });
checkString(new String(cp437Bytes, Charset.forName("CP437")),
"\u03b1\u0393\u03c0\u03a3\u00a3");
}
catch (UnsupportedCharsetException e)
{
System.out.println("Skipped Tests");
// Skip tests as CP437 is not required by the spec.
}
byte[] utf8Bytes = new byte[] { (byte)0xC3,(byte)0x9F,(byte)0xE2,(byte)0x85,(byte)0x93,(byte)0xE2,(byte)0x82,(byte)0xAF,(byte)0xF0,(byte)0x90,(byte)0x85,(byte)0x80 }; //asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
boolean res = checkString(new String(utf8Bytes, Charset.forName("UTF8")),
"\u00DF\u2153\u20AF\uD800\uDD40");
if (!res)
{
System.out.println("Character Match Failed");
return false;
}
return true;
}
private static boolean checkString(String result, String expected)
{
boolean res = true;
for (int a = 0; a < result.length(); ++a)
{
if (result.charAt(a) != expected.charAt(a))
{
res = false;
System.out.println("Failed Character Match");
break;
}
}
if (!result.equals(expected))
{
System.out.println("Failed String.equals Match");
res = false;
}
if (!res)
{
System.out.println("Expected:" + expected + " But Got:" + result);
return false;
}
return true;
}
}
Invoke using
./jdk/jre/bin/java -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts="- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000",gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet -Xshareclasses:none ConsCharset
-Xshareclasses:none
Needs to added along with Mode555, because the AOT compiled method doesn't produce the failure.
The issue occurs only when sun/nio/cs/UTF_8$Decoder.decode([BII[C)I
is compiled.
The optimization that causes this to start failing is partialRedundancyElimination
.
Invoking the following will produce a trace file reliably:
./jdk/jre/bin/java -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts="- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000",gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile,limit='{sun/nio/cs/UTF_8$Decoder.decode([BII[C)I}',disableAsyncCompilation,disableInlining,traceFull,log=jit.trc -Xcheck:gc:vmthreads:all:quiet -Xshareclasses:none ConsCharset
Notably, disableAsyncCompilation
needs to be added or the failure is never reproduced when tracing.
Specifically, this starts failing at transformation 19.
[ 2689] 96.19 O^O RECOMPILATION COUNTERS: VALUE PROFILER: Add JProfiling trees to track the value of node 000075A568374330 near tree 000075A5683743D0, commonNode 1
O^O PARTIAL REDUNDANCY ELIMINATION: Eliminating redundant computation (store) : 000075A568708DE0
O^O PARTIAL REDUNDANCY ELIMINATION: Eliminating redundant computation (store) : 000075A568708BB0
(Invalidating alias info)
Observed a different failure (StringIndexOutOfBoundsException
) at same test suite at an internal build Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1749
:
LT 05:05:49.192 - 4615 Mauve[gnu.testlet.javax.xml.xpath.XPath] Weighting=1
LT 05:05:49.284 - Starting thread. Suite=0 thread=0
LT 05:05:56.217 - First failure detected by thread: load-0. Not creating dumps as no dump generation is requested for this load test
LT 05:05:56.230 - Test failed
LT Failure num. = 1
LT Test number = 3180
LT Test details = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT Suite number = 0
LT Thread number = 0
LT >>> Captured test output >>>
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT got 195 but expected 223
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT got 159 but expected 8531
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT got 226 but expected 8367
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT got 133 but expected 55296
LT FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT got 147 but expected 56640
LT Test failed:
LT java.lang.StringIndexOutOfBoundsException
LT at java.lang.String.charAt(String.java:1397)
LT at gnu.testlet.java.lang.String.ConsCharset.checkString(ConsCharset.java:59)
LT at gnu.testlet.java.lang.String.ConsCharset.test(ConsCharset.java:47)
LT at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
LT at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
LT at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
LT at java.lang.reflect.Method.invoke(Method.java:498)
LT at net.adoptopenjdk.loadTest.adaptors.MauveAdaptor.executeTest(MauveAdaptor.java:74)
LT at net.adoptopenjdk.loadTest.LoadTestRunner$2.run(LoadTestRunner.java:182)
LT at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
LT at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
LT at java.lang.Thread.run(Thread.java:823)
LT <<<
I was looking into this with @AlenBadel as he mentioned in https://github.com/eclipse/openj9/issues/11192#issuecomment-745382213 where PRE decides to profile the array length and inserts placeholder JProfiling Call. With a small hack to make further narrow down the issue, we used a driver that limits the profiling values and used binary search to find out which was the last value for which it adds profiling trees and it starts failing. Following are key observations,
I will continue to help Alen and also take a look myself to find out the cause of the failures.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1868 MauveMultiThrdLoad_special_16
@pshipton we only seen it on Power? Maybe we can set the label?
Thanks a lot @AlenBadel and @r30shah for narrowing the test case down and the investigation. Just to update: since the test case fails practically every time now, we decided that @mnalam-p will try to step through it in the debugger. Whatever trees JPorifiler is inserting they should not be affecting the user data so I am also suspecting some codegen issue. But @r30shah if you could take another look at the inserted tree it would be very helpful.
Sure, set any labels you like.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_testList_2/30 MauveMultiThrdLoad_special_16
Further debugging reveals that the underlying array of the utf-8 encoded string was changed from 5 to 12. 12 being the initial byte array length. In fact, the String's underlying char has the same byte buffer as if it was encoded in ASCII instead of UTF-8. I am currently narrowing down on this issue (why it is not UTF-8 encoded).
Based on gdb debugging and JCL source, it is clear that the string is wrongly encoded as 1-byte utf-8 for the bad case, thus resulting the same byte and length as source byte array. For the input buffer 0xC3 0x9F 0xE2 0x85 0x93 0xE2 0x82 0xAF 0xF0 0x90 0x85 0x80
the utf-8 encoding should be 2-bytes, 3-bytes, 3-bytes, 4-bytes
.
For the 0xC3 0x9F
bytes, the code should take this branch. The compare value b1
is an integer which loads the source byte in this line.
The generated IL for the byte to integer load is -
istore
b2i
==>bloadi
The byte value 0xC3
is a negative value, but in the bad case when it is loaded as integer it becomes a positive value of 0x000000C3
. This is the main reason that causes the first branch to be taken. And thus wrongly encodes the whole source byte as 1-byte utf-8. This is caused by the missing extsb
opcode in the generated code for the bad case.
When the jProfiler is active, the generated IL is changed to -
istore
b2i
==>bRegLoad
This IL does not generates the extsb
instruction and thus leads us to the root cause for this issue.
In the b2iEvaluator, the code is set to generate extsb
for bloadi
but not for bRegLoad
.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/177 MauveMultiThreadLoadTest_special_16 variation: Mode555 JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet cent7-ppcle-4