Open a14n opened 3 years ago
It's actually around 6 times slower when I measure it, if you allow JVM to warmup by calling it couple of times. JVM goes to around 100ms after the first iteration:
0 took 333 ms
1 took 106 ms
2 took 108 ms
3 took 98 ms
4 took 71 ms
5 took 70 ms
6 took 71 ms
7 took 107 ms
8 took 111 ms
9 took 108 ms
Dart pays quite some price for Int64
wrapper and also for dynamic function calls which for no good reason are used in protobuf implementation, but even after changing these things there is still around 3x difference. It seems we additionally pay the price for excessive generality of some code - which we achieve through closures. We should look at forcing enough inlining to eliminate any closure calls on the hot path (currently I can't achieve that purely through @pragma('vm:prefer-inline')
so an alternative might be needed.
/cc @mkustermann
Firstly thanks a lot for the report!
Let me transfer this issue over to dart-lang/sdk and mark it as VM issue for now. We'll investigate the performance issue.
@askeksa-google Could you investigate this?
Per @askeksa-google request I have put the benchmark code I had on GitHub. Dart and Java benchmarks are in osm_benchmark
branch of mraleph/protobuf
repo. I have also put some of the changes I played with (e.g. removing Int64
wrappers and experimenting with avoiding closures in reader in the some-performance-tweaks
branch on the same repo - though these changes are not validated by tests).
@mraleph Maybe you can upstream those changes you already made? (e.g. specialise the internal int64 (de)coding logic for VM vs non-VM)
I don't see a way to upstream Int64
changes (which yield most of the improvement) as those change APIs. Changes related to declosurising reading can be upstreamed (probably) but introduce a lot of code duplication and IIRC they did not actually lead to huge improvement IIRC - hence my original comment that additional research is needed.
Couldn't we at least change class Int64
to allow it to be as tight as _Mint
box. Something like:
const bool isVm = identicial(0, 0.0);
abstract class Int64 {
static Int64 parseInt(String s) => isVm ? VmInt64.parse(s) : WebInt64.parse(s);
}
class VmInt64 implements Int64 {
final int _value; // <- Ensure AOT can unbox this field.
VmInt64(this.value);
factory VmInt64.parse(String s) => VmInt64(int.parse(s));
...
}
class WebInt64 implements Int64 {
final int _l;
final int _m;
final int _h;
...
}
and then make the protobuf decoder still use Int64
but specialize it for isVm
?
@askeksa-google @mraleph We should consider adding this benchmark to golem.
Interesting idea, it would still box Smi-s which otherwise would be just tagged (e.g. in PbList
), but might shave some of the performance difference. It will also require boxing in arguments and return values.
One thing I thought about is that we could add support for passing VmInt64
box around as value type. We could add a special annotation to it @pragma('vm:value-type')
telling VM that its identity does not matter, then in the code like so:
VmInt64 foo(VmInt64 v) {
return v + v;
}
AOT compiler could completely unbox things. The box would only need to be created when crossing types (e.g. from VmInt64
to T
in List<T>
). I think this is very promising and might prepare us for things like records as well.
When parsing PBF files from open-street-map it looks like reading PrimitiveBlock is 2 times slower in dart compare to java.
I attached a primitiveBlock.zip file containing a buffer (8497305 bytes) corresponding to a PrimitiveBlock.
Dart parses the bytes in 430ms:
Java parses the bytes in 220ms:
Versions:
protobuf-1.0.1