dart-lang / macros

A Dart mono-repo for macro development.
BSD 3-Clause "New" or "Revised" License
90 stars 5 forks source link

Evaluate generating models as regular dart classes and not extension types #136

Open jakemac53 opened 2 weeks ago

jakemac53 commented 2 weeks ago

With the new query invalidation approach, the vast majority of objects are actually never sent over the wire. Yet, we are paying a large cost during creation to make them "free" to serialize and deserialize. All accesses to all fields are also relatively slow compared to just accessing regular fields on dart objects.

I suspect that we actually should abandon the extension types and just generate real classes, so that we can generate hash codes much more cheaply. We can still use the binary JSON format though.

A side benefit of this is it would likely make it easier for us to control a centralized implementation of the query and filtering logic. We can make these types implementable via lazy getters, and only read the fields that were asked for in the query during serialization.

cc @davidmorgan

davidmorgan commented 2 weeks ago

Hmmm I've actually been thinking the reverse, that extension types might be a win for performance in other places too.

This is mostly due to pkgs/dart_model/benchmark which appears to show that it's faster to use a JsonBufferBuilder than SDK maps even if you are converting to text in the end

       SdkMapsJsonWireBenchmark: 1622ms, 7177227 bytes
   BuilderMapsJsonWireBenchmark: 1345ms, 7177227 bytes

and that it's faster to iterate over the JsonBufferBuilder maps than SDK maps, too

       ProcessSdkMapsJsonWireBenchmark:  417ms, hash 23186292
ProcessBuilderMapsBuilderWireBenchmark:  339ms, hash 23186292

Worth noting though that in both cases it's faster still to use the data directly without storing it anywhere in between

           LazyMapsJsonWireBenchmark: 1014ms, 7177227 bytes
  ProcessLazyMapsBufferWireBenchmark:  274ms, hash 23186292

--the equivalent for computing hash codes would be that we have a mode for building a Model which doesn't store anything, it just hashes. We could then rerun and actually keep the model data only if the hash changed.

An entirely different avenue for exploration is the overlap between models, it may be that we can arrange that duplicate pieces are only built and hashed once: something like, we build one host-side model with hashes and pull parts out of it to answer queries. If we do that we can easily just compute the hash first before going back to also grab the data if needed.

I could be wrong about any/all this, good thing it's measurable :)

jakemac53 commented 2 weeks ago

faster to use a JsonBufferBuilder than SDK maps even if you are converting to text in the end

I am not suggesting using SDK maps. I am suggesting using real Dart classes with fields, and separate logic to serialize/deserialize them to whatever format. Essentially, what the current macro implementation does.

davidmorgan commented 2 weeks ago

Yes, it's not directly comparable.

Real classes might be worth trying.

But neither extension types nor real classes can be as fast as computing the hash code without allocating anything, or re-using hash codes calculated earlier. So it might not be the first thing to try, depending on how much work they all are :)

jakemac53 commented 2 weeks ago

So would the idea be to basically have a write-only map implementation that we use to back the model when doing query invalidation? And it doesn't actually write to a map, just accumulates a hash?

jakemac53 commented 2 weeks ago

and that it's faster to iterate over the JsonBufferBuilder maps than SDK maps, too

Fwiw I noticed this is using entries to iterate which is cheating a bit so it might be misleading. For example if we merge two models together, that ends up doing a lookup by key for all entries in both maps.

davidmorgan commented 2 weeks ago

So would the idea be to basically have a write-only map implementation that we use to back the model when doing query invalidation? And it doesn't actually write to a map, just accumulates a hash?

Something like that, yes.

We could also generate an alternative version of all the model that stores nothing, and have a copy+paste of all the processing code that imports that instead :)

Or we could structure the model creation code in a pluggable way as discussed previously w.r.t. ways to create the data with only properties that have been queried for.

Fwiw I noticed this is using entries to iterate which is cheating a bit so it might be misleading. For example if we merge two models together, that ends up doing a lookup by key for all entries in both maps.

Yes, it's a "best case" benchmark, not necessarily realistic :)

jakemac53 commented 2 weeks ago

Yes, it's a "best case" benchmark, not necessarily realistic :)

Note that entries for SDK maps is also the slowest way to iterate them. Just for fun I edited the benchmark to iterate the keys and look them up, which shows just how big the cliff can be here:

ProcessSdkMapsJsonWireBenchmark:  355ms, hash 23186292
ProcessLazyMapsBufferWireBenchmark: 36195ms, hash 23186292
ProcessBuilderMapsBuilderWireBenchmark: 57194ms, hash 23186292

I believe this benchmark is also an extreme worst case, with many entries, but still 🤣 .