MauveMultiThreadLoadTest_special_16 String.ConsCharset got 65503 but expected 223

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/177 MauveMultiThreadLoadTest_special_16 variation: Mode555 JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet cent7-ppcle-4

LT  00:32:27.176 - First failure detected by thread: load-1. Not creating dumps as no dump generation is requested for this load test
LT  stderr JVMDUMP039I Processing dump event "throw", detail "net/adoptopenjdk/loadTest/reporting/MauveTestFailureException" at 2020/11/15 00:32:27 - please wait.
LT  stderr JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/openjdk-tests/TKG/test_output_1605414111127/MauveMultiThreadLoadTest_special_16/20201115-003214-MauveMultiThreadTestTrc/results/core.20201115.003227.22927.0001.dmp' in response to an event
STF 00:32:38.975 - Found dump at: /home/jenkins/workspace/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/openjdk-tests/TKG/test_output_1605414111127/MauveMultiThreadLoadTest_special_16/20201115-003214-MauveMultiThreadTestTrc/results/core.20201115.003227.22927.0001.dmp
LT  00:32:39.073 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  00:32:39.088 - suite.isCreateDump() : false
LT  00:32:39.091 - Test failed
LT    Failure num.  = 1
LT    Test number   = 2649
LT    Test details  = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT    Suite number  = 0
LT    Thread number = 1
LT  >>> Captured test output >>>
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT  got 65503 but expected 223
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT  got 57683 but expected 8531
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT  got 57519 but expected 8367
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT  got 65533 but expected 55296
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT  got 65533 but expected 56640
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 11)
LT  got 㿟⅓₯㿽㿽 but expected ￟��
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 12)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 13)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 14)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 15)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 16)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 17)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 18)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 19)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 20)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 21)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 22)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 23)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 24)
LT  <<<

pshipton commented 3 years ago

The test, which failed converting some UTF8 bytes to a String.

public class ConsCharset
  implements Testlet
{
  public void test (TestHarness h)
  {
    try
      {
        byte[] cp437Bytes = asByteArray(new int[] { 224, 226, 227, 228, 156 });
        checkString(h, new String(cp437Bytes, Charset.forName("CP437")),
                                  "\u03b1\u0393\u03c0\u03a3\u00a3");
      }
    catch (UnsupportedCharsetException e)
      {
        // Skip tests as CP437 is not required by the spec.
      }

    byte[] utf8Bytes = asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
    checkString(h, new String(utf8Bytes, Charset.forName("UTF8")),
                "\u00DF\u2153\u20AF\uD800\uDD40");

    byte[] isoBytes = asByteArray(new int[] {0x48,0x65,0x6C,0x6C,0x6F,0x20,0x57,0x6F,0x72,0x6C,0x64,0x21});
    checkString(h, new String(isoBytes, Charset.forName("ISO-8859-1")),
                "Hello World!");
  }

  private void checkString(TestHarness h, String result, String expected)
  {
    for (int a = 0; a < result.length(); ++a)
      {
        h.check(result.charAt(a), expected.charAt(a));
      }
    h.check(result, expected);
  }

  private byte[] asByteArray(int[] ints)
  {
    byte[] bytes = new byte[ints.length];
    for (int a = 0; a < ints.length; ++a)
      bytes[a] = Integer.valueOf(ints[a]).byteValue();
    return bytes;
  }

}

pshipton commented 3 years ago

Run some grinders https://ci.eclipse.org/openj9/job/Grinder/1174/ Failed 2 / 24

pshipton commented 3 years ago

@gita-omr can someone take a look pls.

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1837 - ub18-ppcle-2

pshipton commented 3 years ago

@gita-omr this is failing enough to be a blocker, but it's running in a non-production mode. We need to evaluate if the problem is specific to the non-production options.

gita-omr commented 3 years ago

Yes, we will take a look. @mnalam-p

mnalam-p commented 3 years ago

I tried running in grinder and test locally, I am not able to reproduce the issue with same jdk build (openj9/omr/jcl hash). @pshipton @gita-omr Any suggestion on what else should I try?

pshipton commented 3 years ago

Trying another grinder on the OpenJ9 machines, using the latest build. https://ci.eclipse.org/openj9/job/Grinder/1205/ - failed 6/48

pshipton commented 3 years ago

@mnalam-p the grinder isn't done, but there are still failures https://ci.eclipse.org/openj9/job/Grinder_iteration_2/20/

pshipton commented 3 years ago

btw, I suggested to Nazmul he try grinding using NodesByIterations and NUM_MACHINES to get coverage across a number of machines in order to recreate it.

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1843

As this is being investigated and is known to fail frequently in grinders, I'll stop adding new comments for each failure in the nighty builds.

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly/1063 https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1848

gita-omr commented 3 years ago

Started 5 iterations here: https://ci.eclipse.org/openj9/job/Grinder/1226/

gita-omr commented 3 years ago

Had to replace IP address in the CUSTOMIZED_SDK_URL: https://ci.eclipse.org/openj9/job/Grinder/1227/

gita-omr commented 3 years ago

Had to replace IP address in the CUSTOMIZED_SDK_URL: https://ci.eclipse.org/openj9/job/Grinder/1227/

@jdekonin still did not manage to make it run. Could you please help?

mnalam-p commented 3 years ago

@pshipton NodesByIteration was not able to reproduce the issue. Two config tested -

ITERATIONS=5 and NUM_MACHINES=4
ITERATIONS=20 and NUM_MACHINES=10 Any other suggestions? One question, what is the difference between production / non-production build?

pshipton commented 3 years ago

what is the difference between production / non-production build?

Where do you see that, I might want to change my answer. Something we build nightly or in a personal build is non-production. A build created for a release, with the correct properties, version numbers, and perhaps even building on special machines with restricted access is a production build. There would be no difference in functionality if built from the same SHAs, it's more about the -version output.

Nightly build: openjdk version "1.8.0_282-internal" OpenJDK Runtime Environment (build 1.8.0_282-internal-jenkins_2020_11_26_18_17-b00) Eclipse OpenJ9 VM (build master-2f4ea25a94c, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20201126_579 (JIT enabled, AOT enabled)

Release build: openjdk version "1.8.0_272" OpenJDK Runtime Environment (build 1.8.0_272-b10) Eclipse OpenJ9 VM (build openj9-0.23.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20201022_822 (JIT enabled, AOT enabled)

pshipton commented 3 years ago

Any other suggestions?

Request access to an OpenJ9 machine where the problem occurs.

jdekonin commented 3 years ago

Had to replace IP address in the CUSTOMIZED_SDK_URL...

@gita-omr CUSTOMIZED_SDK_URL has a few links that need modifying you only got the first which was the SDK. The second was the test-images.tar.gz. I relaunched with the correction https://ci.eclipse.org/openj9/job/Grinder/1230/console

Side note: @rajdeepsingh1 increased the artifactory space, so anything after November 19th won't matter it should be in both locations. I realize that doesn't help this issue though.

mnalam-p commented 3 years ago

@pshipton I ran 20 iteration of test.MauveMultiThreadLoadTest on an OpenJ9 machine (twice). All test passed and did not see any test failure.

pshipton commented 3 years ago

I can grind it again if you like. There were more failures last night. https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1848

Did you choose a machine where we've seen failures before?

gita-omr commented 3 years ago

5 iterations passed: https://ci.eclipse.org/openj9/job/Grinder/1230/console

gita-omr commented 3 years ago

I think the above only ran MauveMultiThreadLoadTest_0

Started 1 iteration of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1235/ Passed.

gita-omr commented 3 years ago

Started 25 iterations of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1236/

gita-omr commented 3 years ago

7 out of 100 failed (each iteration has 4 runs).

gita-omr commented 3 years ago

Just wanted to mention that @mnalam-p was able to reproduce on openj9 machine manually and that the failure rate with count=0 seems to be pretty high. Narrowing down the method now.

mnalam-p commented 3 years ago

I have reduced the test cases to only the failing one (ConsCharset) and thus now can cycle test faster. Here is what I have found so far -

Test failure is still random, but with single test case running it is faster to cycle through
Giving any cmdline options to test vm that effects timing (e,g - traceFile, verbose, count etc) affects the reproducibility
@gita-omr ran grinder with disableAutoSIMD and it did not reproduce the test failure I am still trying to reproduce the test failure more consistently. I did created a PR to fix an autoSIMD. As a side test, I am going to also try reproduce the test failure with and without the fix.

Any suggestion would be helpful on what to do next.

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1854

mnalam-p commented 3 years ago

With the AutoSIMD fix, still I was able to generate the failure locally. It is still kind of hit and miss to generate the failure. One observation I can see from the actual vs expected output failure is that the first three values are off by one byte. For 0xC3,0x9F,0xE2 vs 00DF we get 65503 (FFDF) and our expected was 0x00DF For 0x85,0x93,0xE2 vs 2153 we get 57683 (E153) and our expected was 0x2153 Based on this observation I tried to look into asByteArray method.

Also, I modified the suite to enable dump of failure and got a javacore and heapdump as well. I will try to see if I can get some clue from them or not. I am still trying to pinpoint the actual failure to consistently reproduce it.

pshipton commented 3 years ago

FYI a hang in the same test https://github.com/eclipse/openj9/issues/11400

mnalam-p commented 3 years ago

After reducing the test case, I still see the test failure. I am suspecting the problem is in new String(byte [], Charset). Following is the test case which was able to generate the test failure -


import java.nio.charset.UnsupportedCharsetException;

public class ConsCharset
  implements Testlet
{
  public void test (TestHarness h)
  {
    try
      {
        byte[] cp437Bytes = new byte[] { (byte)224, (byte)226, (byte)227, (byte)228, (byte)156 };  //asByteArray(new int[] { 224, 226, 227, 228, 156 });
        checkString(h, new String(cp437Bytes, Charset.forName("CP437")),
                                  "\u03b1\u0393\u03c0\u03a3\u00a3");
      }
    catch (UnsupportedCharsetException e)
      {
        // Skip tests as CP437 is not required by the spec.
      }

    byte[] utf8Bytes = new byte[] { (byte)0xC3,(byte)0x9F,(byte)0xE2,(byte)0x85,(byte)0x93,(byte)0xE2,(byte)0x82,(byte)0xAF,(byte)0xF0,(byte)0x90,(byte)0x85,(byte)0x80 }; //asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
    checkString(h, new String(utf8Bytes, Charset.forName("UTF8")),
                "\u00DF\u2153\u20AF\uD800\uDD40");

    byte[] isoBytes = new byte[] {(byte)0x48,(byte)0x65,(byte)0x6C,(byte)0x6C,(byte)0x6F,(byte)0x20,(byte)0x57,(byte)0x6F,(byte)0x72,(byte)0x6C,(byte)0x64,(byte)0x21}; //asByteArray(new int[] {0x48,0x65,0x6C,0x6C,0x6F,0x20,0x57,0x6F,0x72,0x6C,0x64,0x21});
    checkString(h, new String(isoBytes, Charset.forName("ISO-8859-1")),
                "Hello World!");
  }

  private void checkString(TestHarness h, String result, String expected)
  {
    for (int a = 0; a < result.length(); ++a)
      {
        h.check(result.charAt(a), expected.charAt(a));
      }
    h.check(result, expected);
  }
}

gita-omr commented 3 years ago

Started here: https://ci.eclipse.org/openj9/job/Grinder/1274/

This time with -Xjit=limit={*} since we suspect that even that option can change the behaviour.

mnalam-p commented 3 years ago

I was able to reproduce this issue with the oldest available nightly build from adoptopenjdk (17 April 2020).

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-b09)
Eclipse OpenJ9 VM (build openj9-0.20.0, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20200416_571 (JIT enabled, AOT enabled)
OpenJ9   - 05fa2d361
OMR      - d4365f371
JCL      - cfa47e66cd5f based on jdk8u252-b09)

LT  stderr 
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.getInventory().getInventoryFileRef(): openjdk.test.load/config/inventories/mauve/mauve_multiThread.xml
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - suite.isCreateDump() : true
LT  13:07:20.528 - Test failed. Details recorded in execution log.
LT  13:07:20.528 - Test failed. Details recorded in execution log.
LT  13:07:20.528 - Test failed. Details recorded in execution log.
LT  13:07:20.529 - Test failed
LT    Failure num.  = 1
LT    Test number   = 0
LT    Test details  = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT    Suite number  = 0
LT    Thread number = 34
LT  >>> Captured test output >>>
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT  got 65503 but expected 223
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT  got 57683 but expected 8531
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT  got 57519 but expected 8367
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT  got 65533 but expected 55296
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT  got 65533 but expected 56640
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 11)
LT  got 㿟⅓₯㿽㿽 but expected ￟��

pshipton commented 3 years ago

While it would be nice to figure this out, obviously it doesn't need to be resolved for the 0.24 release, moving it forward.

AlenBadel commented 3 years ago

We've discovered a way to reliably reproduce this failure nearly 100% of the time.

Use the following simplified test:

import java.nio.charset.UnsupportedCharsetException;
import java.nio.charset.Charset;

public class ConsCharset
    {
    public static void main(String[] args)
        {
        System.out.println("Starting Test");
        for (int i = 0; i < 100000; i++)
            {
                if (!runTest())
                {
                        System.out.println("Test Failed");
                        return;
                }
            }
        System.out.println("Test Passed");
        }

    public static boolean runTest()
        {
        try
            {
            byte[] cp437Bytes = new byte[] { (byte)224, (byte)226, (byte)227, (byte)228, (byte)156 };  //asByteArray(new int[] { 224, 226, 227, 228, 156 });
            checkString(new String(cp437Bytes, Charset.forName("CP437")),
                                        "\u03b1\u0393\u03c0\u03a3\u00a3");
            }
        catch (UnsupportedCharsetException e)
            {
            System.out.println("Skipped Tests");
            // Skip tests as CP437 is not required by the spec.
            }

        byte[] utf8Bytes = new byte[] { (byte)0xC3,(byte)0x9F,(byte)0xE2,(byte)0x85,(byte)0x93,(byte)0xE2,(byte)0x82,(byte)0xAF,(byte)0xF0,(byte)0x90,(byte)0x85,(byte)0x80 }; //asByteArray(new int[] { 0xC3,0x9F,0xE2,0x85,0x93,0xE2,0x82,0xAF,0xF0,0x90,0x85,0x80 });
        boolean  res = checkString(new String(utf8Bytes, Charset.forName("UTF8")),
                    "\u00DF\u2153\u20AF\uD800\uDD40");
        if (!res) 
            {
            System.out.println("Character Match Failed");
            return false;
            }
        return true;
        }

    private static boolean checkString(String result, String expected)
        {
        boolean res = true;
        for (int a = 0; a < result.length(); ++a)
            {
                if (result.charAt(a) != expected.charAt(a))
                    {
                    res = false;
                    System.out.println("Failed Character Match");
                    break;
                    }
            }

        if (!result.equals(expected))
            {
            System.out.println("Failed String.equals Match");
            res = false;
            }

        if (!res)
            {
            System.out.println("Expected:" + expected + " But Got:" + result);
            return false;
            }

        return true;
        }
}

Invoke using

./jdk/jre/bin/java -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts="- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000",gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet -Xshareclasses:none ConsCharset

-Xshareclasses:none Needs to added along with Mode555, because the AOT compiled method doesn't produce the failure.

The issue occurs only when sun/nio/cs/UTF_8$Decoder.decode([BII[C)I is compiled.

The optimization that causes this to start failing is partialRedundancyElimination.

Invoking the following will produce a trace file reliably:

./jdk/jre/bin/java -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts="- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000",gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile,limit='{sun/nio/cs/UTF_8$Decoder.decode([BII[C)I}',disableAsyncCompilation,disableInlining,traceFull,log=jit.trc -Xcheck:gc:vmthreads:all:quiet -Xshareclasses:none ConsCharset

Notably, disableAsyncCompilation needs to be added or the failure is never reproduced when tracing.

AlenBadel commented 3 years ago

Specifically, this starts failing at transformation 19.

[  2689]  96.19   O^O RECOMPILATION COUNTERS:  VALUE PROFILER: Add JProfiling trees to track the value of node 000075A568374330 near tree 000075A5683743D0, commonNode 1
         O^O PARTIAL REDUNDANCY ELIMINATION: Eliminating redundant computation (store) : 000075A568708DE0
         O^O PARTIAL REDUNDANCY ELIMINATION: Eliminating redundant computation (store) : 000075A568708BB0
              (Invalidating alias info)

JasonFengJ9 commented 3 years ago

Observed a different failure (StringIndexOutOfBoundsException) at same test suite at an internal build Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1749:

LT  05:05:49.192 -   4615 Mauve[gnu.testlet.javax.xml.xpath.XPath]  Weighting=1 
LT  05:05:49.284 - Starting thread. Suite=0 thread=0
LT  05:05:56.217 - First failure detected by thread: load-0. Not creating dumps as no dump generation is requested for this load test
LT  05:05:56.230 - Test failed
LT    Failure num.  = 1
LT    Test number   = 3180
LT    Test details  = 'Mauve[gnu.testlet.java.lang.String.ConsCharset]'
LT    Suite number  = 0
LT    Thread number = 0
LT  >>> Captured test output >>>
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 0)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 1)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 2)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 3)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 4)
LT  PASS: gnu.testlet.java.lang.String.ConsCharset (number 5)
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 6)
LT  got 195 but expected 223
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 7)
LT  got 159 but expected 8531
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 8)
LT  got 226 but expected 8367
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 9)
LT  got 133 but expected 55296
LT  FAIL: gnu.testlet.java.lang.String.ConsCharset (number 10)
LT  got 147 but expected 56640
LT  Test failed:
LT  java.lang.StringIndexOutOfBoundsException
LT      at java.lang.String.charAt(String.java:1397)
LT      at gnu.testlet.java.lang.String.ConsCharset.checkString(ConsCharset.java:59)
LT      at gnu.testlet.java.lang.String.ConsCharset.test(ConsCharset.java:47)
LT      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
LT      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
LT      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
LT      at java.lang.reflect.Method.invoke(Method.java:498)
LT      at net.adoptopenjdk.loadTest.adaptors.MauveAdaptor.executeTest(MauveAdaptor.java:74)
LT      at net.adoptopenjdk.loadTest.LoadTestRunner$2.run(LoadTestRunner.java:182)
LT      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
LT      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
LT      at java.lang.Thread.run(Thread.java:823)
LT  <<<

r30shah commented 3 years ago

I was looking into this with @AlenBadel as he mentioned in https://github.com/eclipse/openj9/issues/11192#issuecomment-745382213 where PRE decides to profile the array length and inserts placeholder JProfiling Call. With a small hack to make further narrow down the issue, we used a driver that limits the profiling values and used binary search to find out which was the last value for which it adds profiling trees and it starts failing. Following are key observations,

When Last Value to profile is the same array length (mentioned in https://github.com/eclipse/openj9/issues/11192#issuecomment-745382213) it fails .
Power is using the BitShiftHash where for value hash is calculated by series of shift and extract operations compared to BitIndexHash (Needs BitPermute to be enabled) on Z and x86. Going to open up issue to summarize the operation carried out by bit permute evaluator and if seems beneficial, it will work as guide to implement the evaluator.
As the failures happens when we profile that particular array length and Power is using different code to calculate the hash, we had suspicion that it could be the trees for BitShiftIndex which are causing the failure so to verify the doubt, we disabled the Bit Permute and forced to generate code for BitShiftIndex on Z and x86. Both passes and on Z, I have verified, we do profile the same array length and cfg looks similar. It might be possible that the code generated on Power is incorrect so right now looking into the generated code to see if it is the case.
Another thing that does not make any sense is that, no where in the profiling code, we write the array length. It only reads it. Do not know how this could alter the functionality of the Java Method.

I will continue to help Alen and also take a look myself to find out the cause of the failures.

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1868 MauveMultiThrdLoad_special_16

gita-omr commented 3 years ago

@pshipton we only seen it on Power? Maybe we can set the label?

gita-omr commented 3 years ago

Thanks a lot @AlenBadel and @r30shah for narrowing the test case down and the investigation. Just to update: since the test case fails practically every time now, we decided that @mnalam-p will try to step through it in the debugger. Whatever trees JPorifiler is inserting they should not be affecting the user data so I am also suspecting some codegen issue. But @r30shah if you could take another look at the inserted tree it would be very helpful.

pshipton commented 3 years ago

Sure, set any labels you like.

pshipton commented 3 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_testList_2/30 MauveMultiThrdLoad_special_16

mnalam-p commented 3 years ago

Further debugging reveals that the underlying array of the utf-8 encoded string was changed from 5 to 12. 12 being the initial byte array length. In fact, the String's underlying char has the same byte buffer as if it was encoded in ASCII instead of UTF-8. I am currently narrowing down on this issue (why it is not UTF-8 encoded).

mnalam-p commented 3 years ago

Based on gdb debugging and JCL source, it is clear that the string is wrongly encoded as 1-byte utf-8 for the bad case, thus resulting the same byte and length as source byte array. For the input buffer 0xC3 0x9F 0xE2 0x85 0x93 0xE2 0x82 0xAF 0xF0 0x90 0x85 0x80 the utf-8 encoding should be 2-bytes, 3-bytes, 3-bytes, 4-bytes.

For the 0xC3 0x9F bytes, the code should take this branch. The compare value b1 is an integer which loads the source byte in this line.

The generated IL for the byte to integer load is -

istore
   b2i
      ==>bloadi

The byte value 0xC3 is a negative value, but in the bad case when it is loaded as integer it becomes a positive value of 0x000000C3. This is the main reason that causes the first branch to be taken. And thus wrongly encodes the whole source byte as 1-byte utf-8. This is caused by the missing extsb opcode in the generated code for the bad case.

When the jProfiler is active, the generated IL is changed to -

istore
   b2i
      ==>bRegLoad

This IL does not generates the extsb instruction and thus leads us to the root cause for this issue. In the b2iEvaluator, the code is set to generate extsb for bloadi but not for bRegLoad.

eclipse-openj9 / openj9

MauveMultiThreadLoadTest_special_16 String.ConsCharset got 65503 but expected 223 #11192