Open scheglov opened 8 years ago
@stereotype441
@scheglov Thanks for pointing out that this is not a VM performance problem, but an underlying library issue.
Doesn't codeUnitAt return the UTF-16 encoding?
Yes, it does. But if the string is ASCII-only, then it is the same as UTF-8, isn't it?
The first version is actually not that naive - it knows, without checking anything, the size of the resulting output. That's one of the reasons UTF-8 encoding is not as fast. To avoid over-allocating it runs through the string once to find the result length, then runs through it again to do the actual encoding.
I'm not sure we have an "ASCII" special-case. That might be worth it - if the input length equals the output length, then you can just write the string into the result Uint8List directly.
True, it knows the length.
We could speculatively allocate Uint8List
and fill it as we check.
When ASCII: 84 ms
vs. 424 ms
- 5 times faster.
When not ASCII: 589 ms
vs. 459 ms
- about 30% slower.
import 'dart:convert';
import 'dart:typed_data';
main() {
// String str = 'aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbccccЙ';
String str = 'aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbcccc';
{
Stopwatch sw = new Stopwatch()..start();
for (int i = 0; i < 1000 * 1000; i++) {
toUTF8(str);
}
print('Optimized time: ${sw.elapsedMilliseconds}');
}
{
Stopwatch sw = new Stopwatch()..start();
for (int i = 0; i < 1000 * 1000; i++) {
UTF8.encode(str);
}
print('UTF8 time: ${sw.elapsedMilliseconds}');
}
}
List<int> toUTF8(String str) {
int length = str.length;
Uint8List bytes = new Uint8List(length);
for (int i = 0; i < length; i++) {
int unit = str.codeUnitAt(i);
if (unit >= 128) {
return UTF8.encode(str);
}
bytes[i] = unit;
}
return bytes;
}
@askeksa-google has worked on fast UTF-8 decoding earlier this year, and is probably going to take a look at faster UTF-8 encoding in Q4.
The following program encodes that same ASCII string using a naive approach and using actual
UTF8.encode()
. The naive approach is about3 times
faster. Could UTF8 be optimized to provide better performance when possible?