eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

CompactStrings Unicode handling is broken #7639

Closed patric42 closed 4 years ago

patric42 commented 4 years ago

Java -version output

openjdk version "1.8.0_232" OpenJDK Runtime Environment (build 1.8.0_232-b09) Eclipse OpenJ9 VM (build openj9-0.17.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20191017_442 (JIT enabled, AOT enabled) OpenJ9 - 77c1cf708 OMR - 20db4fbc JCL - 97b5ec8f383 based on jdk8u232-b09)

Summary of problem

Executing the supplied code runs with just

java CompactStringBug

works, but if executed with

java -XX:+CompactStrings CompactStringBug

it fails.

11.0.5 (and before) works btw.

Code (as it's not a supported file type...)

import java.io.BufferedReader;
import java.io.StringReader;

public class CompactStringBug {

   /**
    * Doesn't work if -XX:+CompactStrings command line is set
    *
    * OpenJ9 8u212/8u232 64bit Linux
    */

   private static int lc = 0;

   public static void main(String[] args) {
      final String s = "\u201cContains Unicode\u201d"; // unicode double quotation marks
      System.out.println("Using String: "+s);

      String combined = s + "\n" + s;
      System.out.println("Combined String is: >>>\n"+combined+"\n<<<");

      System.out.println("\nUsing stringreader...");
      BufferedReader reader = new BufferedReader(new StringReader(combined));
      reader.lines().forEach(line -> {
         System.out.println("Line "+ (lc++) + ": '" + line + "'");
         if (!line.startsWith(s)) {
            System.out.println("Difference in index " + lc + ": " + line);
            System.out.println("... fail.\n");
            throw new IllegalStateException("Text suddenly differs");
         }
      });
      System.out.println("...successful.");
   }
}
DanHeidinga commented 4 years ago

FYI @fjeremic

PascalSchumacher commented 4 years ago

I looks like we are also affected by this bug.

After adding -XX:+CompactStrings Strings containing the euro sign are corrupted.

We are using:

openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.5+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.17.0, JRE 11 Linux amd64-64-Bit Compressed References 20191016_358 (JIT enabled, AOT enabled)
OpenJ9   - 77c1cf708
OMR      - 20db4fbc
JCL      - 2a7af5674b based on jdk-11.0.5+10)

The test from @patric42 passes.

Sadly I have not been able to create a simple test case to reproduce the isse (it occurs in XML created with https://github.com/jOOQ/jOOX) yet.

fjeremic commented 4 years ago

I'll be taking a look at this today/tomorrow. Will post updates as I progress.

pshipton commented 4 years ago

@fjeremic any updates?

fjeremic commented 4 years ago

Just checked on JDK11 things are passing, but JDK8 seems to have an issue. Digging into it deeper now. It fails with -Xint so it is not a JIT related issue.

fjeremic commented 4 years ago

Here is a minimal reproducible testcase:

public class CompactStringBug {

   /**
    * Doesn't work if -XX:+CompactStrings command line is set
    *
    * OpenJ9 8u212/8u232 64bit Linux
    */
   public static void main(String[] args) throws java.io.IOException {
      char[] cb = new char[7];
      cb[0] = '\u201c';
      cb[1] = 'X';
      cb[2] = '\u201d';
      cb[3] = '\n';
      cb[4] = '\u201c';
      cb[5] = 'X';
      cb[6] = '\u201d';

      StringBuilder sb = new StringBuilder();
      sb.append(cb, 4, 3);
      System.out.println(sb);
   }
}

The issue happens because of a bug in String.compressible: https://github.com/eclipse/openj9/blob/154c39d2fab4708e107d2d2a19738dfbf601edde/jcl/src/java.base/share/classes/java/lang/String.java#L256-L261

The API is incorrectly implemented, as the loop termination should be start + length, not just length. In the above test the starting index is 4 but length is 3 so we end up returning true in String.compressible thinking the string is a candidate for compression. We the subsequently copy only the first byte of the character array and we get the wrong value. I'll have a fix out shortly.

The reason things pass on JDK11 is because we use the OpenJDK StringBuffer/StringBuilder classes there so we don't end up going down the same path of calling String.compressible. However there is a use case of String.compressible with a non-zero start value within String itself that should affect JDK11 also. I'll fix this as well.

PascalSchumacher commented 4 years ago

Thank you very much for fixing this!