Open jakemac53 opened 2 weeks ago
Hmmm I've actually been thinking the reverse, that extension types might be a win for performance in other places too.
This is mostly due to pkgs/dart_model/benchmark
which appears to show that it's faster to use a JsonBufferBuilder
than SDK maps even if you are converting to text in the end
SdkMapsJsonWireBenchmark: 1622ms, 7177227 bytes
BuilderMapsJsonWireBenchmark: 1345ms, 7177227 bytes
and that it's faster to iterate over the JsonBufferBuilder
maps than SDK maps, too
ProcessSdkMapsJsonWireBenchmark: 417ms, hash 23186292
ProcessBuilderMapsBuilderWireBenchmark: 339ms, hash 23186292
Worth noting though that in both cases it's faster still to use the data directly without storing it anywhere in between
LazyMapsJsonWireBenchmark: 1014ms, 7177227 bytes
ProcessLazyMapsBufferWireBenchmark: 274ms, hash 23186292
--the equivalent for computing hash codes would be that we have a mode for building a Model
which doesn't store anything, it just hashes. We could then rerun and actually keep the model data only if the hash changed.
An entirely different avenue for exploration is the overlap between models, it may be that we can arrange that duplicate pieces are only built and hashed once: something like, we build one host-side model with hashes and pull parts out of it to answer queries. If we do that we can easily just compute the hash first before going back to also grab the data if needed.
I could be wrong about any/all this, good thing it's measurable :)
faster to use a
JsonBufferBuilder
than SDK maps even if you are converting to text in the end
I am not suggesting using SDK maps. I am suggesting using real Dart classes with fields, and separate logic to serialize/deserialize them to whatever format. Essentially, what the current macro implementation does.
Yes, it's not directly comparable.
Real classes might be worth trying.
But neither extension types nor real classes can be as fast as computing the hash code without allocating anything, or re-using hash codes calculated earlier. So it might not be the first thing to try, depending on how much work they all are :)
So would the idea be to basically have a write-only map implementation that we use to back the model when doing query invalidation? And it doesn't actually write to a map, just accumulates a hash?
and that it's faster to iterate over the
JsonBufferBuilder
maps than SDK maps, too
Fwiw I noticed this is using entries
to iterate which is cheating a bit so it might be misleading. For example if we merge two models together, that ends up doing a lookup by key for all entries in both maps.
So would the idea be to basically have a write-only map implementation that we use to back the model when doing query invalidation? And it doesn't actually write to a map, just accumulates a hash?
Something like that, yes.
We could also generate an alternative version of all the model that stores nothing, and have a copy+paste of all the processing code that imports that instead :)
Or we could structure the model creation code in a pluggable way as discussed previously w.r.t. ways to create the data with only properties that have been queried for.
Fwiw I noticed this is using entries to iterate which is cheating a bit so it might be misleading. For example if we merge two models together, that ends up doing a lookup by key for all entries in both maps.
Yes, it's a "best case" benchmark, not necessarily realistic :)
Yes, it's a "best case" benchmark, not necessarily realistic :)
Note that entries
for SDK maps is also the slowest way to iterate them. Just for fun I edited the benchmark to iterate the keys and look them up, which shows just how big the cliff can be here:
ProcessSdkMapsJsonWireBenchmark: 355ms, hash 23186292
ProcessLazyMapsBufferWireBenchmark: 36195ms, hash 23186292
ProcessBuilderMapsBuilderWireBenchmark: 57194ms, hash 23186292
I believe this benchmark is also an extreme worst case, with many entries, but still 🤣 .
With the new query invalidation approach, the vast majority of objects are actually never sent over the wire. Yet, we are paying a large cost during creation to make them "free" to serialize and deserialize. All accesses to all fields are also relatively slow compared to just accessing regular fields on dart objects.
I suspect that we actually should abandon the extension types and just generate real classes, so that we can generate hash codes much more cheaply. We can still use the binary JSON format though.
A side benefit of this is it would likely make it easier for us to control a centralized implementation of the query and filtering logic. We can make these types implementable via lazy getters, and only read the fields that were asked for in the query during serialization.
cc @davidmorgan