dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.24k stars 1.57k forks source link

Recent analyzer version broke the behavior of code-generators when relying on "invalid types" #52455

Open rrousselGit opened 1 year ago

rrousselGit commented 1 year ago

Hello!

Before the recent InvalidType changes, when a code-generator encountered an invalid type, they generated "dynamic". But now, the same generator will generate InvalidType, which is not a valid symbol.

This breaks apps that use possibly two code-generators.
As an example, consider an app which uses two code-generators:

Then, a user would write:

@CopyWith()
class Example with _$Example { // mixin used to add the copyWith method
  Example(this.value);
  List<CodeGeneratedClass> value;
}

In that case, before the recent InvalidType changes, the generated copyWith would be:

mixin _$Example on Example {
  Example copyWith({
    // "value" is typed as dynamic because at the time of generation,
    // the class "CodeGeneratedClass" does not exist. Therefore dynamic used to be returned.
    List<dynamic> value, 
  }) => ...;
}

That behavior was reasonable. It wasn't perfectly typed, but the generated code worked. So someone could reasonably write:

void main() {
  Example example = Example([]);
  example = example.coyWith(value: [CodeGeneratedClass()]);
}

But with the recent changes, the generated copyWith is now instead:

mixin _$Example on Example {
  Example copyWith({
    List<InvalidType> value, 
  }) => ...;
}

The problem is that this code is no longer valid. The InvalidType type does not exist, and even if it did, CodeGeneratedClass would not be assignable to it.

As such, the application does not compile anymore

lrhn commented 1 year ago

Would it be better to have an UnimplementedType to represent a type whose declaration isn't found. (In case there are more ways to be invalid, otherwise it'd just be a rename.) That way, the analyzer model can explicitly represent a partial program, which might be useful for other code-generation tasks. (Even better if the model can be updated by adding more files.)

rrousselGit commented 1 year ago

The problem isn't the analyzer model for me. For me, the existence of the InvalidType object makes sense.

It's about how "InvalidType.getDisplayString/toString" likely should either return dynamic or the user-defined type.

It returning "InvalidType" imo makes no sense because that's never going to be something we want in the generated code.

srawlins commented 1 year ago

CC @scheglov

I don't think setting InvalidType.toString should be dynamic, nor getDisplayString (meant for messages, not for code).

I think the author of code-generating code should check for InvalidType, and change the behavior accordingly.

scheglov commented 1 year ago

DartType.getDisplayString() and toString() are... for displaying only. Not for code generation. So, I think this works as expected. Ideally, there should be a separate function that converts types into code, and checks specifically for every type it supports.

As for tracking the reasons why we get InvalidType, and specific subtypes like UnresolvedTypeNameIdentifierInvalidType or InvocationOfGetterOnExpressionOfInvalidType, etc - these are possible theoretically, but I'd like to see really useful cases before implementing this.

rrousselGit commented 1 year ago

Writing a DartType in the generated code is a really common case.

All code generators need to do this. I wrote propably a dozen of code generators, and every single one of them had to do this. So if toString/gerDisplayString can't be used for this, we need an official function for it. Ideally one which preserves what the user wrote, so preserves import aliases/typedefs & co.

In a way, we want a DartType.toSource

scheglov commented 1 year ago

Yeah, I know that code generators need it.

We have one that we use for quick assists, https://github.com/dart-lang/sdk/blob/0fae0f373ae442e3d021e1009b02acb9bd24fce0/pkg/analyzer_plugin/lib/src/utilities/change_builder/change_builder_dart.dart#L1218

It does support type aliases, and records which imports should be added, and whether type parameters are available. As you can see, it is quite complicated. I'm not sure when / if we will decide to make something like this an API.

bwilkerson commented 1 year ago

That method is already used by a public API. The analyzer_plugin package is published, and there are public APIs that call this implementation method. But I don't believe that any code generators use the ChangeBuilder to write code because they're typically writing a whole file rather than making changes to portions of a file.

srawlins commented 1 year ago

Writing a DartType in the generated code is a really common case.

All code generators need to do this. I wrote propably a dozen of code generators, and every single one of them had to do this. So if toString/gerDisplayString can't be used for this, we need an official function for it. Ideally one which preserves what the user wrote, so preserves import aliases/typedefs & co.

I don't think it is a given that analyzer APIs are directly responsible for providing this functionality. I agree that it is important (crucial, really) for any code generator. But I don't think DartType.toString() was at all doing things like tracking import prefixes.

In the few code generators I've worked on, I've used code_builder and found it to be very elegant, expressive, and correct. It is well-maintained, and used by many code generators. @rrousselGit have you looked into code_builder? I'd be very curious to know if you have and chose not to use it for some reason. But it'd be great if it could be the canonical API for generating code with properly prefixed types.

(There used to be a hang up with default values: if you were generating a class A that subclassed B and overrode a method with a parameter with a default value, then you had to properly recreate the exact text for that default value. source_gen has an API for doing this called Reviver which is not perfect. However, as of null safety (I think?) you no longer need to re-type default values in method overrides.)

rrousselGit commented 1 year ago

In the few code generators I've worked on, I've used code_builder and found it to be very elegant, expressive, and correct. It is well-maintained, and used by many code generators. @rrousselGit have you looked into code_builder? I'd be very curious to know if you have and chose not to use it for some reason. But it'd be great if it could be the canonical API for generating code with properly prefixed types.

My first generator was made this way.

I really dislike it. In my opinion, string templates with interpolation are significantly more readable and flexible than the custom language code_builder offers.

I would hate it if I was forced to use code_builder. I'm voluntarily avoiding it.

I'd very much prefer if the Element or DartType had the unresolved user-defined string like how the Ast has it. The Ast already has exactly what I want. It's just that generators don't have access to it. They only have the Elements.

scheglov commented 1 year ago

I mostly prefer composing Dart code using strings in quick assists and refactoring myself. Sometimes however it is convenient to use a helper to write a DartType, or an element reference, and this helper should manage imports.

I think keeping pieces of AST in elements and types is not viable. These AST pieces to not very useful without a context, and keeping context means much more significant heap usage. And most importantly, not all DartTypes have a unique source node. Some types are produced as LUB of two types, e.g. typeOf(flag ? 1 : 2.3) == 'num', some using other type algorithms. And then there is type inference.

So, we have to keep DartType independent from AST.

rrousselGit commented 1 year ago

But what about storing the type's source in a few specific elements? We don't need it for all dart types/elements I think. The main ones are cases where there is user source code.

Like ParameterElement or variable declarations.

I don't think there's a real use-case for other sources of DartTypes. In the end, generators are trying to copy paste user written code. It shouldn't rely on any form of inference or other logic.

scheglov commented 1 year ago

If you need AST, there is a way to ask for resolved AST and work with it. I'm not sure about specific generators, and their semantics, but https://github.com/dart-lang/sdk/blob/44b5ca76a9f9b0f2d1ed381ae722521c9984ba3b/pkg/analyzer/lib/dart/analysis/session.dart#L69

which is used for example in

https://github.com/realm/realm-dart/blob/707edcc8aaf13ff5c320f27d744ae7d185dd692a/generator/lib/src/realm_object_generator.dart#L41

There are a few more examples on GitHub.

rrousselGit commented 1 year ago

I'm aware of this function, but it being async has significant ramifications. It's a common source of InconsistentAnalysisExceptions.

And the performance of this is unlikely to be ideal when considering that generators may have to call these hundreds of times per library, with different elements. Finding the associated AST for a given Element is not very efficient

scheglov commented 1 year ago

I agree with you about performance, internally we use AnalysisSessionHelper that caches resolved libraries.

https://github.com/dart-lang/sdk/blob/7d138719ecc890067af21cce5578c260f3fd8c6a/pkg/analyzer/lib/src/dart/analysis/session_helper.dart#L14

But probably a better idea would be that you don't ask element by element, and instead request the whole resolved library right at a entry point of the generator, and then work with it, iterate over AST, find interesting resolved methods, etc.

InconsistentAnalysisExceptions is not an issue of the analyzer, but the client that runs it. Nobody forces the client to invoke AnalysisContext.changeFile() while there are generators running. Do one thing at a time, as a good Dart isolate.

rrousselGit commented 1 year ago

Part of the issue is that build_runner natively doesn't expose the CompilationUnit or anything of the sort.

So pretty much all code generators end-up using the Element tree, since it's the only thing we're given. When writing my latest generator, I made sure to voluntarily go out of my way to get the Ast instead of Elements. But that's not reasonable for existing codegens.

My bet is that there are probably hundreds of generators out there relying on DartType.toString/getDisplayString and on the Element tree instead of Ast.

It's to the point where import aliases are very rarely supported, and generators depending on each other knowingly generates a "dynamic" (well, now an InvalidType). And the expected solution involves an almost complete rewrite of the generator or a performance hit (+ the ramifications of making your code async in various places).

The amount of work needed by the community to deal with this is non negligible.

aarajput commented 1 year ago

I am facing exact same issue. AppLocalizations is being considered as InvalidType. Now my code generator is causing issues because of that.

TekExplorer commented 11 months ago

For us who want to make generators, we use what we have access to that works. In my case, i dont see how i can do it correctly without getDisplayString or toString. There's no documentation for the "right" way of doing it, and i actually am using code_builder, but how do i even know what i should do when i cant find useful documentation?

bwilkerson commented 11 months ago

@jakemac53 @natebosch Assuming you're still maintaining it, is this support that could be added to build_runner?

jakemac53 commented 11 months ago

It is maintained, but we don't really do much new feature development. What is actually being asked for here?

If the goal is some sort of functionality to convert a DartType into a code string, I would not want to add that into build_runner etc. It could live in a codegen utility package such as source_gen, but would probably need to be community contributed.

bwilkerson commented 11 months ago

Yes, the ask is for a way to convert a DartType to valid source code. It appears, from the conversation above, that code generators are currently using either toDisplayString, which isn't guaranteed to produce valid Dart code, or toString which isn't guaranteed to have any consistent or meaningful semantics at all.

But it doesn't feel like the analyzer package is the right place to start adding code generation support. This would be the only such API in the package. There's support in the analyzer_plugins package, but that's really targeted at the IDE use case, so it isn't an ideal fit either.

We either need a more appropriate place to add this kind of support or we probably need to ask users to write their own support. One other suggestion was possibly adding it to code_builder, but that package appears to not be used by all interested parties.

rrousselGit commented 11 months ago

IMO the ideal solution for most use-cases is still to have a plain "toSource" rather than a DartType.toString

I've migrated a few of my generators to simply use the raw AST for displaying types, and it offers what I need. Maybe build_runner should expose that directly instead of hiding it from users.

jakemac53 commented 11 months ago

build_runner does have astNodeFor and compilationUnitFor

rrousselGit commented 11 months ago

Yes but they are async, and not so efficient. That's been the leading cause of InconsistentAnalysisExceptions for me.

LibraryElement is synchronously available from the get-go. IMO we should have AST synchronously available right next to the element. I assume it's already known after-all. Afaik we can't obtain a LibraryElement without obtaining the Ast first, right?

jakemac53 commented 11 months ago

AFAIK the AST nodes are discarded by the analyzer intentionally once the element model is created, because otherwise they occupy a large amount of memory (and can be fairly cheaply re-created).

bwilkerson commented 11 months ago

I've migrated a few of my generators to simply use the raw AST for displaying types, and it offers what I need.

First, be aware that the element model contains inferred type information that might not be explicit in the source code (and hence not in the AST) but which might be necessary for code generation.

Second, if you're using AstNode.toSource, please take the docs seriously:

  /// Return a textual description of this node in a form approximating valid
  /// source.
  ///
  /// The returned string will not be valid source primarily in the case where
  /// the node itself is not well-formed.

There is no promise that the result will be valid code, and invalid input source is only one possible reason for this. The method is not intended for use by code generators, and we don't use it for any of our code generation needs in the analysis server.

Afaik we can't obtain a LibraryElement without obtaining the Ast first, right?

That is not the case. The analysis server caches the element model on disk in the form of "summary" files. When we want to load an element model for analysis purposes we read it off disk (if it's in the cache) and never create an AST for it.

And Jake is right, even when the element models are created from an AST the ASTs are too large to cache in memory. That's the primary reason why you can't get from the element model back to the AST structure from which it was built.

rrousselGit commented 11 months ago

First, be aware that the element model contains inferred type information that might not be explicit in the source code (and hence not in the AST) but which might be necessary for code generation.

Yes, but the AST has access to the element model. The issue is the reserve case: The element does not have access to the AST, yet the AST contains some information that is not available in the Element tree.

Take Freezed for instance. It relies on users defining redirecting factories:

factory MyClass(int a) = WhateverTheUserWants;

And generates a WhateverTheUserWants class based on the constructor.
At the same time, it does not generate anything for non-redirecting factories (factory MyClass(int a) {...}).

Yet the element tree is unable to do this:

On the flip side, all of this is available in the AST.

I regularly had folks complaining about code-generation issues due to them combining generators with each-others or using import aliases or funky typedef.
And from my experience, all of those issues are naturally solved when the generator stops relying on DartType and instead TypeAnnotation.

rrousselGit commented 11 months ago

Ultimately it boils down to: All of my code-generators will over time end-up using the AST to solve the various code-generation issues people commonly face.

At which point, people will start running multiple code-generators that are are requesting for a resolved AST tree. Wouldn't the solution end-up worse than exposing the AST directly?

IMO most code-generators would benefit from having direct access to the AST. For example, most could throw an InvalidSourceGenerationException if a class is missing its extends _$MyClass, to help stop mistakes.
One of the reasons generators don't do it today is because they don't naturally have the info.

bwilkerson commented 11 months ago

I appear to have not been very clear, and I apologize for any confusion that caused.

I was absolutely not suggesting that the AST should not be used by code generators. We use the AST all the time in the analysis server. I find it to be absolutely essential.

But I would also never say that the AST has everything you might need. The element model can be equally critical. It all depends on what kind of problem you're trying to solve (or code you're trying to generate).

Take something as seemingly simple as trying to get the type of a parameter. If there's a type annotation on the parameter, then taking the type annotation from the source might be the right solution (though you do need to be careful of type parameters). But if there is no type annotation, and the type of the parameter is inferred from an overridden method, and if you can't omit the type in the generated code, then you'll need to go to the element model to get the information you need.

If your case allows you to safely assume that the type annotation will always be available, that's great. Not knowing what your case required I just wanted to point out a potential pit-fall to save you some pain.

But let me reiterate: none of the methods on DartType are intended to be used by code generators, nor are we going to change them to be usable that way because doing so would break the use cases they are intended for.

natebosch commented 11 months ago

Yes but they are async, and not so efficient.

I'd expect eagerly holding the entire AST so it can be read synchronously is likely to be much less resource efficient than the current approach.

That's been the leading cause of InconsistentAnalysisExceptions for me.

The astNodeFor API was written specifically to avoid InconsistentAnalysisException cases. If you are getting that exception with that API you should file an issue.

TekExplorer commented 11 months ago

I think this is a fallacy. The problem is new users of the generator that may want to do something simple wont even be aware of the AST, and will only be aware of DartType and Elements

If it's not supposed to be used for code generation, then you should add an api that does.

If there exists one, then DartType etc should document that you should use \ for it.

In my own case, i had issues with inconsistent generation due to using DartType. I'm well aware it's not for code gen, but frankly speaking there's no documentation or way for me to know what i should use

Meaning either the api, the documentation, or both needs to be improved.

jakemac53 commented 10 months ago

Keep in mind that the analyzer as a whole was never designed for code generation. It is being used for that task to be sure, but that isn't the core use case (analyzing code, not generating it). It is not their job to document or implement codegen specific utilities, especially if the logic can reasonably live in a different package.

The main question becomes, what package? I don't think code_builder is appropriate for reasons given above. It very well might need a custom implementation though to fit better with its model. I don't think it should live in build_runner either because it is meant to be a lower level package, and only integrates explicitly with the analyzer to the minimum extent necessary to facilitate other packages.

That is what leads me to source_gen as the best package for this logic, if the Dart team is going to own it. It already has lots of special logic, which is tailored to use of the analyzer and code generation.

The next question is around should implement it, and who should maintain it. If it lives in source_gen then the Dart team will effectively own the maintenance of it, but anybody could provide the implementation. I personally don't want to write the implementation (in particular, the tests, as it would need extensive tests). It probably needs to come from the community. But maybe some other Dart team member would be interested, I don't know.

rrousselGit commented 10 months ago

I'll admit that I find the current situation a bit odd.

I'd expect 99% if not 100% of code-generators to need this. And we've had code-generation for years. How were Google projects dealing with this until now?

Although I understand the pushback with implementing it today, considering macros are on the way. Still, it's such a crucial functionality. I wonder how it has gone unnoticed for so long 😛

TekExplorer commented 10 months ago

I wonder how it has gone unnoticed for so long

Probably because current tools, despite not being designed for it, do work mostly well enough for code gen.

jakemac53 commented 10 months ago

Probably because current tools, despite not being designed for it, do work mostly well enough for code gen.

☝️ the current API has happened to work until now, even though it was documented saying not to use it for that purpose. It isn't surprising it was indeed used in that way, because of the difficulty in rolling your own version and the fact that it mostly worked.

I'd expect 99% if not 100% of code-generators to need this. And we've had code-generation for years. How were Google projects dealing with this until now?

Fwiw, I don't know of any Google owned code generators that require this, internally or externally, or at least haven't seen any issues surrounding it filed. It is possible they just tend to avoid relying on the "dynamic" fallback too.

normidar commented 9 months ago

I am trying to create a generator, I am also facing this problem, I would like to know if it will be supported or not, if not I will use AST.

srawlins commented 9 months ago

I am trying to create a generator, I am also facing this problem, I would like to know if it will be supported or not

Not any time soon.

rrousselGit commented 6 months ago

Coming back to this: It appears that analyzer 6.5.0 makes this even harder for code-generators.

Now, the argument withNullability on getDisplayString is deprecated. Meaning that the output string will always contain an unexpected character. A generator using type.getDisplayString(withNullability: false) today to output List<int> will have no easy way to upgrade, as now the output would now be List<int*>