dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.11k stars 1.57k forks source link

Analyzer API 5.0.0 doesn't visit certain identifiers #51043

Open nex3 opened 1 year ago

nex3 commented 1 year ago

Reproduction:

import 'dart:io';

import 'package:analyzer/dart/analysis/features.dart';
import 'package:analyzer/dart/analysis/utilities.dart';
import 'package:analyzer/dart/ast/ast.dart';
import 'package:analyzer/dart/ast/visitor.dart';

void main() {
  File("analyzer-test.dart").writeAsStringSync("Null functionName() => null;");
  parseFile(path: "analyzer-test.dart", featureSet: FeatureSet.latestLanguageVersion())
    .unit.accept(_Visitor());
}

class _Visitor extends RecursiveAstVisitor<void>  {
  void visitSimpleIdentifier(SimpleIdentifier node) {
    print(node);
  }

  void visitLibraryIdentifier(LibraryIdentifier node) {
    print(node);
  }

  void visitPrefixedIdentifier(PrefixedIdentifier node) {
    print(node);
  }
}

With analyzer 4.7.0, this printed Null and functionName. Analyzer 5.0.0 and later only prints Null. This is preventing Dart Sass from upgrading its analyzer version.

srawlins commented 1 year ago

cc @scheglov

bwilkerson commented 1 year ago

That behavior is expected. We changed the AST structure so that we no longer create instances of SimpleIdentifier in places where the identifier is being declared.

The identifier is still accessible as a token from the FunctionDeclaration (using the getter name).

The same is true for other declarations.

This is preventing Dart Sass from upgrading its analyzer version.

Why is this preventing you from upgrading? It looks like the only required action is to also visit the nodes representing declarations, but I might be missing something.

scheglov commented 1 year ago

Yes, it is not a drop-in replacement, but this is why it is a different major version, there are changes to the API. Using SimpleIdentifier for a name of a declared entity is not fully correct. The name is not an expression, it does not have a type, it does not reference anything. So, it is just a Token, not a SimpleIdentifier.

nex3 commented 1 year ago

That behavior is expected. We changed the AST structure so that we no longer create instances of SimpleIdentifier in places where the identifier is being declared.

It would be really helpful if you'd include intentional breaking changes in the changelog. I checked the 5.0.0 changelog when I was trying to debug this (which took a considerable amount of time!) and again before filling this issue to make sure that this wasn't intentional, and all it says is "Removed deprecated ..." which strongly implies that the only breaking changes involve removing members that were already producing deprecation messages. Not having this information makes it very hard to know how to upgrade across versions.

Why is this preventing you from upgrading? It looks like the only required action is to also visit the nodes representing declarations, but I might be missing something.

It was blocking because we didn't know the right way to upgrade. The RecursiveAstVisitor class has 163 methods each referring to a different sparsely-documented AST class, and I don't know offhand which of those can represent structures that don't have simple identifiers. visitSimpleIdentifier was very valuable because it meant we didn't have to explicitly enumerate all of them, we could just rely on the sub-unit that showed up for every user-authored name in the library.

If the semantics for SimpleIdentifier don't work here, could we at least have a visitToken() or visitUserDefinedName() utility method that could serve the same purpose?

bwilkerson commented 1 year ago

It would be really helpful if you'd include intentional breaking changes in the changelog.

We'll try to do better in the future. Sorry for the pain this caused.

... I don't know offhand which of those can represent structures that don't have simple identifiers.

I don't either. My understanding is that it's only the nodes that declare a name, but I'm not certain that we won't / haven't already removed such nodes in other places.

... could we at least have a visitToken() or visitUserDefinedName() utility method ...

I don't know what the solution is, but we'll try to keep this use case in mind.

If tokens would be useful, then I'll note that you could ask the compilation unit for its beginToken and iterate through the list (using next) until you reach the end of the token stream.

scheglov commented 1 year ago

Version 4.4.0 change log says:

* Deprecated 'XyzDeclaration.name' in AST, use `name2` and `declaredElement` instead.
nex3 commented 1 year ago

If tokens would be useful, then I'll note that you could ask the compilation unit for its beginToken and iterate through the list (using next) until you reach the end of the token stream.

The problem is that we also need to make more semantic decisions based on the structure of the library while building up a textual buffer that matches the current position, so we'd need to visit the tokens only after visiting the AST nodes that contain them rather as a separate pass. If you're curious, this is the code in question.

Version 4.4.0 change log says:

* Deprecated 'XyzDeclaration.name' in AST, use `name2` and `declaredElement` instead.

I hope you can understand why this didn't immediately communicate "the behavior of RecursiveAstVisitor.visitSimpleIdentifier will have a breaking change" to me. :sweat_smile:

scheglov commented 1 year ago

OK, I agree that the relationship with RecursiveAstVisitor.visitSimpleIdentifier was not obvious.

nex3 commented 1 year ago

In general, it would be helpful if changelog messages like that also included a description of why the change is being made—what are the semantic differences between name, name2, and declaredElement? Why are the latter two preferable over the former? That would at least give readers a fighting chance to infer how they should change their code beyond just doing a search-and-replace for identifiers.

nex3 commented 1 year ago

Can we increase the priority of this? It's blocking Dart Sass from upgrading to analyzer 5.x.

bwilkerson commented 1 year ago

I don't think we'll be able to prioritize any work on this until after the Dart 3.0 work is completed.

Even after the 3.0 work is done it isn't clear to me that we want to make any changes to the analyzer APIs to accommodate this use case. There has been no discussion, so I'm not saying that we won't make any changes, just that I can't say that there will be any.

While I agree that the API change is making it harder for you to upgrade to analyzer 5.x, I don't believe that it's blocking the upgrade. Whatever code you'd like us to add to the analyzer could also be added outside the analyzer. For example, you could easily write a visitor class that will visit declaration nodes and invoke a visitToken method that takes the required token. You current visitor could then subclass that class and would only need to override the visitToken method to have the same behavior it has today.

nex3 commented 1 year ago

How can I write that in a way that will stay up-to-date when tokens are added at new locations in the AST?

bwilkerson commented 1 year ago

You can't. Because (as I understand it) it isn't all tokens you're interested in, it's only a subset of the tokens.

Every time there's a breaking change to the API, you're going to have to review the changes and decide how your code needs to be updated in response.

I know that that's not an easy thing to do. I know that the current state isn't ideal from the perspective of a client of the analyzer package. And I respect that you're in a difficult position. It would be nice if it were easier for every client of the analyzer package to keep up with the changes to the API. Unfortunately, that probably isn't practical. It's just part of the cost of using the analyzer package.

nex3 commented 1 year ago

I'm fine with updating for api changes, as long as the basic functionality remains in place. My problem is that I need the package to abstract away language changes. The entire point of using the analyzer package is that it understands the changing structure of Dart code so I don't need to handle that explicitly myself, but with this change you're telling me that I now need to manually enumerate every place in the language that a user-authored token can appear.

That's why I say this is blocking me. Of course I could rewrite this functionality outside the package, in the same way I could write my own Dart parser. But encapsulating the structure of the language is the core feature I need from the analyzer, and that's gone in 5.x, so I can't use 5.x.

Maybe a better way to approach this is: why not use SimpleIdentifiers to represent name declarations? What value did this change provide the analyzer?

bwilkerson commented 1 year ago

I'm fine with updating for api changes, as long as the basic functionality remains in place.

Which hinges on the definition of "basic functionality". The basic functionality of the visitSimpleIdentifier method is to visit SimpleIdentifier nodes in an AST structure, and that hasn't changed. But I'm wondering whether you might have thought that the basic functionality of visitSimpleIdentifier was to visit every <identifier> in the grammar. If so, that would explain our differing points of view.

My problem is that I need the package to abstract away language changes.

I'm not sure what kind of changes you're talking about, but I don't think that's ever been a goal for the AST structure. The AST structure models the syntactic structure of the language, so it's fairly common for the AST structure to change when the language changes.

That said, the change from storing a SimpleIdentifier node to storing the token that the SimpleIdentifier used to store was not motivated by a language change.

... why not use SimpleIdentifiers to represent name declarations? What value did this change provide the analyzer?

As Konstantin mentioned above, a SimpleIdentifier node represents an <identifier> in an expression. As a result, the class has a large number of methods that don't apply to identifiers that aren't being used in an expression. The API is better now that it no longer gives the false appearance that the name of a class declaration (for example) can meaningfully be asked for its precedence, or asked whether it is inConstantContext.

nex3 commented 1 year ago

I'm not sure what kind of changes you're talking about, but I don't think that's ever been a goal for the AST structure. The AST structure models the syntactic structure of the language, so it's fairly common for the AST structure to change when the language changes.

The value of using a pre-existing visitor class is that it allows me to avoid manually traversing every different type of node in the AST, and thus tightly coupling my transformer code to the changes in the language. When Dart adds pattern matching, I don't need to add a bunch of logic to find identifiers in it because the visitor traversal handles that for me. That's what I mean by "abstracting away language changes." With analyzer 5.x, the addition of new types of declarations are no longer abstracted away; I now need to explicitly handle each one, or my transformer will break.

As Konstantin mentioned above, a SimpleIdentifier node represents an <identifier> in an expression. As a result, the class has a large number of methods that don't apply to identifiers that aren't being used in an expression. The API is better now that it no longer gives the false appearance that the name of a class declaration (for example) can meaningfully be asked for its precedence, or asked whether it is inConstantContext.

When you say the API is "better", were there actual concrete problems caused by the old API? Or is this a philosophical decision based on API cleanliness? Because I'm trying to tell you that this decision is causing very real problems for Sass right now.

And if you really think it's not that burdensome to explicitly enumerate all the declarations, why not support the same functionality in a more principled way by adding a visitNameToken() helper that you call for each <identifier> in the grammar?

bwilkerson commented 1 year ago

... the addition of new types of declarations are no longer abstracted away ...

The previous "abstraction" was not an intentional part of the design. It was a convenient coincidence that the identifiers being represented by SimpleIdentifier were exactly the ones you wanted to visit. But the semantics you want (in terms of which identifiers are visited) isn't something that we can commit to always encoding directly in our representation. If we ever support your use case it will not be by restoring the older representation.

Because I'm trying to tell you that this decision is causing very real problems for Sass right now.

You have succeeded in doing so. I understand that this has created work for you, both in the short term and in the future as we continue to evolve the AST representation, and I'm sorry that it has. I am not in any way attempting to minimize the pain you're feeling nor am I trying to say that it will be trivial for you to deal with.

But the reality is that we can't promise to not make breaking changes, especially when the break is related to unintentional behavior. The best I can promise is that we will continue to not make breaking changes lightly.

And if you really think it's not that burdensome to explicitly enumerate all the declarations, why not support the same functionality in a more principled way ...

If we were going to support this functionality (and no decision has been made as to whether we will) we would definitely do so in a more principled way. The reality is that we don't have the resources to support that effort right now, nor can I give you any guarantee that we will be able to support it at any point in the future.

I know that this answer is disappointing, but it's the best answer I can offer.