Open natebosch opened 1 year ago
It probably would be feasible tbh... the language version part is pretty easy (although I do worry about divergence in the edge cases), and everything we care about reading has to come before any other directives, so we only need to support basically the library/import/export/part/augmentation keywords? As soon as we see one of those we just stop. Although honestly we could maybe be even more general and just stop on the first non-annotation keyword of any kind? Would that break at all?
honestly we could maybe be even more general and just stop on the first non-annotation keyword of any kind? Would that break at all?
I can't think of any way it would break. The only things we should need to be able to parse are comments, language version token, and annotations. Within the annotations we only need to be able to read a few specific annotations, and we can skip reading any we don't recognize.
I think we'd only need to support
@scheglov @bwilkerson - do you think this would be an easy parser to write separately from the analyzer?
It sounds like there are enough simplifying assumptions that you could write a fairly simple scanner/parser to get the information you need. You probably don't even need to worry about recovery.
The only question is how likely it is for the language to change in a way that would break those assumptions (such as using @
for something other than the start of an annotation). My guess is that it's fairly unlikely and hence fairly safe.
I think it is ideologically wrong to write copies of the parser, even simplified ones.
I think it is ideologically wrong to write copies of the parser, even simplified ones.
I get this, but also from a pragmatic perspective the analyzer is an extremely big hammer to use here, and it also has by far the largest churn of any of our dependencies, which causes additional maintenance burden beyond that of a simple parser imo.
It is non-trivial to correctly implement the second requirement (from scratch) because block comments can nest and therefore are not regular i.e. they can't be recognized by a regular expression. Furthermore, Strings can also nest (e.g. const foo = " ${"${"${2}"}"}";
) which complicates things more. In other words, simple heuristics would not be enough to filter them out without failing on some code. (They would need to be filtered out because they could contain code. If not, correctness could not be guaranteed.)
@natebosch have you considered using the lexer that comes with _fe_analyzer_shared
? IIRC it handles the nestedness of block comments for you. You would need to properly maintain a stack for non-raw string literals, but then, I think it would be possible to use a regular expression to extract the patterns that you need (i.e. directives, declarations and then annotations) from within the spans that are not comments and not interpolated strings. (_but https://pub.dev/packages/_fe_analyzer_shared/versions seems to indicate that that wouldn't solve your problem?_)
Resolution would have to be approximated, but I think that wouldn't be a big problem, because I think it's unlikely that people would use the same identifiers for annotations as the test package.
I get this, but also from a pragmatic perspective the analyzer is an extremely big hammer to use here, and it also has by far the largest churn of any of our dependencies, which causes additional maintenance burden
I think it would be nice if the underlying problem could be addressed (but I don't know how). I've had similar issues with vector_math, meta and collection (unfortunately, I was forced to fork vector_math and I can't depend on meta & collection anymore because it's too risky from a dependency management perspective for me). This might just be a band-aid.
have you considered using the lexer that comes with
_fe_analyzer_shared
?
We could do that but it has many more breaking versions than analyzer, so it would be worse from a maintenance perspective as a dependency.
Not to mention that it was named with a leading underscore because we consider it to be private to the SDK and don't want non-SDK packages to depend on it directly. (Note the README.)
Furthermore, Strings can also nest (e.g.
const foo = " ${"${"${2}"}"}";
)
We don't support strings like this for test metadata so we could bail with an error as soon as we see the $
in a non-raw string.
It is non-trivial to correctly implement the second requirement (from scratch) because block comments can nest and therefore are not regular
I don't think we'd parse with regex in any case, but if we do hit difficulty handling this in the parser I would be comfortable with adding a restriction. I would be amazed if there were any _test.dart
files that have nested block comments before the first import.
Resolution would have to be approximated
It's already approximated. We don't perform any resolution when parsing metadata today.
The issue description says:
Read annotations of specific types from the library or the first directive.
I understood this to mean that you would also need to extract annotations from declarations. If that is not the case, then I agree that nested Strings would not be an issue. The grammar does not allow any interpolated strings in directive-related production rules. Declarations are only expected after any directives.
And I also agree that nested block comments would probably not be an issue, but I think Konstantins concerns are very justified. He seems to imply that he is concerned about correctness aspects of such an ad-hoc parser and I would also be concerned about that (and any unexpected behavior that that would entail).
I agree with all the other concerns around the _fe_analyzer_shared
-based solution.
I think there is a simple solution that would satisfy all the constraints presented so far. Erik maintains an ANTLR-based grammar specification for Dart (spec, wiki). There is a Dart runtime for ANTLR here. Here is the command to build it after you've installed antlr. Here, here and here is how to invoke the generated lexer and parser. (I don't know if the tool is able to generate a type-safe AST in Dart).
If you consider this solution, I think Erik should be consulted for whether he intends to support such a use case. The goals of the ANTLR-based spec seem to differ quite a bit from the goals of a parser in production.
Recently I came across vendor. It seems like this package (maintained by google, not a third party) would be a perfect fit for solving this issue by occasionally vendoring analyzer.
Packages published under google.dev
are not officially supported google projects. Its just a place for googlers to put their open source things with minimal headache and red tape. Open source packages published by Google developers. These are not officially supported Google products.
But either way our issues around analyzer don't rise to the level of needing to vendor analyzer, its just a maintenance burden but we don't actually need to be on an older version.
Its just a place for googlers to put their open source things with minimal headache and red tape.
Got it, I didn't know that. Thank you.
Furthermore, Strings can also nest (e.g.
const foo = " ${"${"${2}"}"}";
)We don't support strings like this for test metadata so we could bail with an error as soon as we see the
$
in a non-raw string.
An edge where this could cause a problem is if some non-test related annotation uses nested strings. I don't think I've ever seen it, and I think it would be OK to restrict that as well, but it's not as clearcut as I had thought.
We currently rely on
parseString
for two purposes:For the latter use case we rely on tracking the source span for specific String arguments in the annotations.
Both of these are much simpler than parsing and resolving the entirety of the Dart language.
The dependency on
analyzer
fromtest_core
causes pain for the ecosystem, and forflutter_test
.I wonder - how much effort would it be to write and maintain a small parser for just this metadata, and how might that compare to the current effort rolling versions of analyzer (typically a version bump and publish for this package).