dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.06k stars 1.56k forks source link

Improve String handling in Dart for native platforms #55913

Open eugenechyrski opened 3 months ago

eugenechyrski commented 3 months ago

Current implementation of string handling in dart is sub optimal: StringBaseSubstringMatches can be improved to utilize simd instructions. I created simple test utilizing stringzilla library for OneByteStrings and got results showing that performance of indexOf operation can be improved up to 10 times.

dart version 3.5.0-edge.a479f91e80875dd6661b12108c9b81bdaeb2af65

What has been done:

I added another method into String class .String indexOfStrinzilla(String other);

3.string_path.dart has been modified as well

  u/pragma("vm:external-name", "String_indexOfStrinzilla")
  external String indexOfStrinzilla(String other);
  1. In c part : string.cc
DEFINE_NATIVE_ENTRY(String_indexOfStrinzilla, 0, 2) {
  const String& receiver =
      String::CheckedHandle(zone, arguments->NativeArgAt(0));
  ASSERT(!receiver.IsNull());
  GET_NON_NULL_NATIVE_ARGUMENT(String, b, arguments->NativeArgAt(1));
  return String::StringZillaTest(receiver,b);
}
  1. In bootstrap_natives.h

V(String_indexOfStrinzilla, 2)

  1. In object.cc

    
    StringPtr String::StringZillaTest(const String& str,const String& str2, Heap::Space space) {
    
    if (str.IsOneByteString()) {
    
    sz::string_view source = sz::string_view(reinterpret_cast<const char*>(OneByteString::CharAddr(str, 0)));
    sz::string_view target = sz::string_view(reinterpret_cast<const char*>(OneByteString::CharAddr(str2, 0)));
    source.find_last_of(target) ;
    
    }else{
     std::cout << "called to StringZillaTest two byte string" << std::endl;
    }
 return  str.ptr();
}
7. Test dart script:

void main(List args) async { String testStr = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ";

double totalDart = 0; double totalSimd = 0;

for (int j = 1; j < testStr.length; j++) { String search = "/" * 80; String longStr = testStr.substring(0, j) + search + testStr.substring(j); var stopwatch = Stopwatch();

stopwatch.start();
for (int i = 0; i < 100000; i++) {
  longStr.indexOfStrinzilla(search);
}
double stringzilla = stopwatch.elapsedMilliseconds / 1;

totalSimd += stringzilla;
stopwatch = Stopwatch();

stopwatch.start();
for (int i = 0; i < 100000; i++) {
  longStr.lastIndexOf(search);
}
double dart = stopwatch.elapsedMilliseconds / 1;
totalDart += dart;
print('$j $stringzilla $dart');

} print('total $totalSimd $totalDart'); }



Results:

total 3695.0 34174.0
Looks like the closer the search string is to the end of the source string the more efficient dart is, however performance of simd only depends on the length of the search string

Also conversion of string  encodings can be improved. Dart can have native  latin1\utf8\utf16\utf32 encoders decoders based on [simdutf](https://simdutf.github.io/simdutf/) which will significantly improve performance of these operations(See attached benchamrks in the simdutf documentation).
Refactoring  [vm/unicode.cc](https://github.com/dart-lang/sdk/blob/main/runtime/vm/unicode.cc) and [platform/unicode.cc](https://github.com/dart-lang/sdk/blob/main/runtime/platform/unicode.cc) makes code up to 3 times more performant in my tests
kyrill007 commented 3 months ago

These are impressive performance gains that could be achieved with such a small change. It would be good to add this fix and not ignore it.