Quantify the cost of merging augmentations into the analyzer and CFE data models

davidmorgan commented 3 months ago

Thanks to @scheglov for documenting how the analyzer hosts macros, this makes the discussion much more concrete :)

For performance, the critical piece we can dig into from here seems to be the cost of merging.

By that I mean: when macros emit augmentations, how much do we pay for merging them into the analyzer/CFE data model so that the next macros can introspect on them (in cases when they are allowed to)?

What I am expecting is that the cost is very different for different types of augmentations roughly matching the split into phases, so for example new types are cheap in phase 1 because they are emitted before most of analysis happens; and phase 3 is cheap because it only changes definitions.

Phase 2 is likely the difficult one :)

What I would like to discover is if there is any difference between types of augmentation and how this plays into how we allow macros to interact. So for example if top level declarations are more or less expensive than field declarations. Two specific things that should come out of this: are there types of interaction between macros we currently allow that do not scale well with program size / number of macros / number of applications? And: are there additional types of interaction between macros we should allow because they are cheap?

@jakemac53 @scheglov @johnniwinther

jakemac53 commented 3 months ago

Fwiw, today even the phase 1 interactions cause problems in unexpected ways for the large library cycle cases. This is because today when writing out the actual augmentation file after phase 1 runs, we ask the analyzer/CFE to resolve identifiers. Re-running any phase 1 macro might affect how identifiers are resolved, meaning we have to also re-run macros on any library that imports the affected library, and the same for anything that imports those libraries. If every library in a cycle runs macros, we thus always have to re-run all macros across the entire cycle. This is one of the core roots of the issue with doing more efficient invalidation for library cycles today.

jakemac53 commented 2 months ago

Based on discussions in https://github.com/dart-lang/sdk/issues/55784, for example this comment, it seems that the cost is not really phase dependent. It is dependent just on the amount of code being generated, and mostly goes to just the parsing and handling of the code itself. This is actually taking longer than running the macros themselves, if I understand things correctly (for the JSON macro, at least).

I am not sure exactly what the takeaway is from that though.

davidmorgan commented 2 months ago

Yeah if there is additional cost just from the code being in augmentations, and additional cost related to merging macro output augmentations to a single file, this is something we should understand.

It makes "merge to source" more interesting for performance, I guess. Maybe we end up supporting that for even for full macros with the guidance "to scale to huge codebases (...library cycles...) you can optionally check in (part of?) the output".

jakemac53 commented 2 months ago

It makes "merge to source" more interesting for performance, I guess. Maybe we end up supporting that for even for full macros with the guidance "to scale to huge codebases (...library cycles...) you can optionally check in (part of?) the output".

It could make the situation worse too - most of the signature will anyways be reproduced in the augmentation.

dart-lang / language

Quantify the cost of merging augmentations into the analyzer and CFE data models #3868