dart-lang / language

Design of the Dart language
Other
2.65k stars 202 forks source link

Consider generalizing macros over applications and annotations #3873

Open davidmorgan opened 4 months ago

davidmorgan commented 4 months ago

A difference that cropped up naturally in the dart_model exploration, and seems worth considering independently of that work.

Generalize over applications:

Consider having one macro "runtime instance" handle multiple applications.

There isn't any particular advantage to macro code executing in the context one of application: rather, all applications of a macro Foo are runs of the same code that in the worst case can certainly manage not to interfere with itself, and in the best case can benefit from sharing work.

For example, suppose that the Foo macro must introspect and make decisions about all the types used as a return type in the class Foo is applied to.

Then, there is a high chance this work will be duplicated across Foo applications, because different classes using Foo are very likely to have some of the same return types. Sharing the work across applications can lead to significantly better scalability.

@Foo()
class Bar1 {
  final Baz baz;
}
@Foo()
class Bar2 {
  final Baz baz; // Work related to `Baz` done by the `Foo` application to `Bar1` can be reused.
}

This can also lead to opportunities to batch requests and responses for better serialization throughput.

Generalize over annotations:

Consider having once macro "instance" handle, if so configured, multiple annotations.

There is also no particular advantage to tying macros to a single annotation. One macro package might offer annotations Foo and Bar that care about each other: different aspects of the same generated code. Then, there is no particular reason to force every Foo to redo the work done by Bar, when it could simply be shared. So: support using the same macro code for both Foo and Bar and running it on all Foo and all Bar.

@Foo()
class Bar1 {
  final Baz baz;
}
@Bar()
class Bar2 {
  final Baz baz; // Work related to `Baz` done by the `Foo` application to `Bar1` can be reused here too.
}

Merging Foo and Bar into a single macro in this way also simplifies the job of the host, because they are no longer separate macro applications each of which might produce output visible to the other; they share information internally and must produce one unified set of augmentations, leaving no more work for the host.

Generalize fully?

Probably not. To generalize fully in this direction would mean:

  1. Given a program, a macro arbitrarily decides whether it wants to run;
  2. It consumes any part of the program it likes as input;
  3. It generates augmentations anywhere in the program as output.

And this seems too much: #1 and #3 are too surprising.

A reasonable compromise seems to be to restrict #1 and #2 back to focusing on annotations:

  1. A macro is associated with one or more annotations #3728;
  2. A macro runs if any of the annotations it is associated with is "applied" as defined in #3728;
  3. It consumes any part of the program reachable from those annotation applications as input;
  4. It generates augmentations in any library containing those annotations applications as output.

What about modular builds?

The host should communicate which applications are "read only": i.e. the augmentation has already been written in an earlier build step; the macro does not have to produce the augmentation again but is free to introspect that application if it wants.

jakemac53 commented 4 months ago

A macro can sort of do this already on a single library... although you end up needing an extra annotation. A macro can run on an entire library, and then introspect to find the annotations it wants to generate code for, share code/work as desired, and alter anything within the library.

There isn't any particular advantage to macro code executing in the context one of application

Within a single library, probably it would be fine. I wouldn't want to generalize this to all libraries in an app though, because then there is the (I would argue significant) advantage that it encourages writing code which is deterministic and not reliant upon the specifics of how a given compiler works, or whole world versus incremental versus modular compiles etc.

As a concrete example, consider this issue which was recently filed. A macro author with access to the entire world at once is likely to want to write macros like this, which require global knowledge, and either just won't work in certain modes (hot reload) or will be slow, non-deterministic, etc.

The current model where each application is treated in isolation helps macro authors stay on the rails and write macros that will work consistently, as well as avoiding anti-patterns such as global queries.

davidmorgan commented 4 months ago

It's easy to offer libraries on top that make it look like execution is in isolation:

https://github.com/dart-lang/language/blob/main/working/macros/dart_model/macro_client/lib/class_generator_macro.dart

And these can then do optimization behind the scenes, leaving the option to do something more complex only for those who understand it / want it.

I think that then combines the advantages of both approaches :)

rrousselGit commented 4 months ago

I do believe https://github.com/dart-lang/language/issues/3854 is a very important scenario. I myself have a similar need in many occasions.

For instance, I want to generate DB binding for a model .... which includes a schema migration. I want to generate a single function that migrates all data at once. Yet there's no easy way to do so yet.

Generating one function per model/library would be unrealistic, because it'd be very error prone (easy to forget to invoke one library's function).

I think that's fine if the generated code is often invalidated and slower to generate. It's important to be able to do it at all.

davidmorgan commented 4 months ago

Thanks Remi.

Are you saying you want one macro application to do all that work--or is it okay to have one macro applications per data model class?

I think the latter can work nicely: each model class application can output a function for that model class, then a whole program application can output code that aggregates--calls them all. That won't be super fast but you only need one in your program so rerunning it all the time should be fast enough.

rrousselGit commented 4 months ago

I don't care too much about the exact details. Personally, I'd have one annotation per model ... and maybe one on the "main" (that would look-up every files in the package). But if a different pattern enables implementing this, I don't mind.


On that topic, is there a certain library order when dealing with generation over multiple libraries?

For instance, given:

// lib/main.dart
import 'src/model.dart';

@macro
void main( ){}

// lib/src/model.dart
@anotherMacro
class Foo {}

Would the macro on Foo generate before the one in main? Or maybe they generate in parallel (based on phases)?

Since macros can use IO a bit, I was wondering if /lib/src/foo.dart could output to .dart_tool/lib/src/foo.my_macro.part. Then have the lib/main.dart lookup for all .dart_tool/**/*.my_macro.part and generate something based on it.

davidmorgan commented 4 months ago

Execution order mostly depends on imports: as long as there are no imports from model.dart onto main.dart, model.dart macros can execute and complete before main.dart macro run at all. If there is a cycle of imports (a "library cycle") they have to run "in parallel" in some sense.

rrousselGit commented 4 months ago

Any chance we could have a mechanism that gives macros a fixed order, even when libraries are not related through an import?
I'd like to avoid asking folks to import their entire project by hand. It's super painful that whenever you add a new file dart, you have to list it in your main.

One way could be to order files based on the path depth. Such that /lib/a.dart runs after /lib/src/b.dart (because leaves first).

So lib/main.dart could have access to anything that's inside the lib/src folder safely.

davidmorgan commented 4 months ago

Dart requires everything to be findable via imports from an entrypoint, so that would require a change independent of macros.

rrousselGit commented 4 months ago

It would be findable via imports from the entrypoint. I'm talking about applying a macro on the main, and have the generated code add extra imports:

@macro
void main() {}

// ...
import 'src/model.dart' as prefix0;
import 'src/model2.dart' as prefix1;

augment main() {
  prefix0.Model.migrate();
  prefix1.Model2.migrate();
  ...
}
davidmorgan commented 4 months ago

I don't think that's possible. But anyway it's unrelated to this issue, which is for a specific discussion about the work in progress.

davidmorgan commented 3 months ago

Moved to the "breaking changes" milestone with the expectation that we close this discussion as part of finalizing the host<->macro protocol, not that we do it.

jakemac53 commented 3 months ago

I do think we shouldn't try to do this right now, but we could investigate whether it could be done later on. So, whether we just close it or move it into a different bucket either is fine with me.

davidmorgan commented 3 months ago

I think this will naturally come with the switch to dart_model.

A macro's initial query, whether explicitly or implicitly made, probably includes the class the annotation is applied to, then everything reachable from that in a particular way. (e.g. types mentioned in fields).

It's not obvious why a macro would want to limit that to just the one application, because that would split the same work over more requests+responses, and in some cases introduce duplicate work.

Instead it might ask for: all the applications in one file; all applications that are being applied right now; all applications in the program.

I think it's not so much about implementing new stuff as about skipping the limitation that once macro application has to match up with one round trip on the wire. It's easy to get that simplicity again if you want it, example code that splits the data and delegates for each application.

jakemac53 commented 3 months ago

I think that this is actually quite a complicated feature. One problem for example is ordering, and introspection loops.

If a macro can be applied at different "layers" - lets say a top level function or a method - those don't run at the same time. The method should fully run first.

As far as introspection loops, it greatly increases the chances of one happening if the granularity of macro applications is larger. You could have a DAG in terms of the actual dependencies of two classes, but since macro applications on each are now grouped together, you get a cycle where one didn't exist previously.

davidmorgan commented 3 months ago

Yes, that's fair; maybe it only makes sense in combination with "rounds".

Which I think also come under "only do this if it turns out to be easier" :)