dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.24k stars 1.57k forks source link

Dart String operation much slower than Java #29131

Open jumperchen opened 7 years ago

jumperchen commented 7 years ago

Scenario

We have a large zip file which contains a huge xml file to be parsed. In Java, it provides a standard unzip and xml parser class which is quite fast, and we want to do the same things in Dart VM, but Dartlang didn't provide such API for us to use.

After doing some testing, we figure out that the Dart String operation could be the problem.

Here is the simple example to demonstrate such issue (Note: Java's XML parser doesn't need to OP with String value, it could be more faster, see at the bottom info)

import 'dart:io';

// DartLang : Simulate a XML parsing.
main() {
  var start = new DateTime.now().millisecondsSinceEpoch;
  String data = new File('large.xml')
      .readAsStringSync();
  print('File Length: ${data.length}');
  int i = 0;
  List<int> units = data.codeUnits;
  var foo;
  var value;
  for (int j = 0; j < units.length; j++) {
    if (j % 6 == 0) {
      value = [];
    }
    value.add(units[j]);
    if (j % 6 == 5) {
      i++;
      foo = new Foo(new String.fromCharCodes(value));
      value = null;
    }
  }
  print('Parsing time: ${new DateTime.now().millisecondsSinceEpoch - start}ms');
  print('Object creation times: ${i}');
}

class Foo {
  String value;
  Foo(this.value);
}

Note: StringBuffer in Dartlang is much slower than this codeUnits implementation.

// Java
public static void main(String[] args) throws IOException {
    long start = System.nanoTime();
    String f = FileUtils.readFileToString(new File("large.xml"), Charsets.UTF_8);
    Foo foo = null;
    System.out.println(f.length());
    int i = 0;
    char[] chars = f.toCharArray();
    StringBuilder value = null;
    for (int j = 0, k = chars.length; j < k; j++) {
        if (j % 6 == 0) {
            value = new StringBuilder();
        }
        value.append(chars[j]);
        if (j % 6 == 5) {
            i++;
            foo = new Foo(value.toString());
            value = null;
        }
    }
    System.out.println("Parsing time: " + TimeUnit.MILLISECONDS
            .convert(System.nanoTime() - start, TimeUnit.NANOSECONDS) + "ms");
    System.out.println(i);
}
static class Foo {
    String value;
    Foo(String value) {
        this.value = value;
    }
}

The result

FileSize Java Dart
20 MB File Length: 20375011
Parsing time: 572ms
Object creation times: 3395835
File Length: 20375011
Parsign time: 938ms
Object creation times: 3395835
163 MB File Length: 163000011
Parsing time: 4060ms
Object creation times: 27166668
File Length: 163000011
Parsign time: 6223ms
Object creation times: 27166668
326 MB File Length: 326000011
Parsing time: 7202ms
Object creation times: 54333335
File Length: 326000011
Parsign time: 11871ms
Object creation times: 54333335

Extra Info

Here is the Java XML Parser example to parse the 326MB file.

    long start = System.nanoTime();
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader xmlStreamReader = factory
            .createXMLStreamReader(new FileInputStream(new File("large.xml")));
    int i = 0;
    String tagName;

    while(xmlStreamReader.hasNext()) {
        if (xmlStreamReader.next() == XMLStreamConstants.START_ELEMENT) {
            tagName = xmlStreamReader.getName().getLocalPart();
            i++;
        }
    }
    System.out.println("Parsing time: " + TimeUnit.MILLISECONDS
            .convert(System.nanoTime() - start, TimeUnit.NANOSECONDS) + "ms");
    System.out.println(i);  

It takes only 4658 ms to parse.

Note: The attached file for 326MB large.xml.zip

floitschG commented 7 years ago

I had a short look, but couldn't find anything obvious. The VM developers will hopefully have more information. Btw. There is a Stopwatch class that is designed for these kinds of measurements.

Below is the same example using a StringBuffer. Normally it should be faster (or at least as fast), but currently it's ~50% slower.

Also interesting: replacing the j % 6 with the similar (and in theory faster) j.remainder(6) actually made things much worse.

import 'dart:io';

// DartLang : Simulate a XML parsing.
main() {
  var sw = new Stopwatch()..start();
  String data = new File('large.xml')
      .readAsStringSync();
  print('File Length: ${data.length}');
  int i = 0;
  List<int> units = data.codeUnits;
  var foo;
  var buffer = new StringBuffer();
  for (int j = 0; j < units.length; j++) {
    buffer.writeCharCode(units[j]);
    if (j % 6 == 5) {
      i++;
      foo = new Foo(buffer.toString());
      buffer.clear();
    }
  }
  print('Parsing time: ${sw.elapsedMilliseconds}ms');
  print('Object creation times: ${i}');
}

class Foo {
  String value;
  Foo(this.value);
}