adams85 / acornima

Acornima is a standard-compliant JavaScript parser for .NET. It is a fork of Esprima.NET combined with the .NET port of the acornjs parser.
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Convenience API for dealing with variable scopes #13

Open adams85 opened 1 month ago

adams85 commented 1 month ago

This PR is kind of an experiment to find out if exposing the variable scopes tracked by the parser for checking variable redeclarations makes sense in the first place, and if so, how it can be technically achieved and what the public API of such a feature should look like.

My goal with this is to provide a starting point to help us understand the problem better, explore our options so we can discuss them and shape the implementation and API accordingly.

The main requirement regarding the implementation is that it must not degrade the performance of the primary consumer (Jint execution engine) significantly. This applies to both execution time and memory allocation + the memory footprint of the resulting AST. Big changes to the parser's logic are not allowed either, as it should stay a close translation of acornjs.

Beyond that, the main goal, of course, is to come up with the most convenient API possible for the consumers. However, I don't intend to implement and maintain an "all batteries included" solution but a general one that provides access to the scope information (and maybe some convenience features in the Extras package) for consumers who want to deal with variable scopes. That is, a limited feature set is preferred, which leaves the possibility open for consumers to work with variable scopes in a way that is optimized to their specific use cases (I mean primarily the freedom to choose the data structure to store the scope tree.)

All ideas and feedback are welcome!

adams85 commented 1 month ago

As a next step, I plan to do some dogfooding with my ES module bundler. That, however, uses the core package only and will implement a custom scope tree data structure. In other words, this won't do any dogfooding to ScopeInfo in the Extras package.

So it would be great if you can provide feedback on how satisfied you are with the API of ScopeInfo, what additional features you need or would include, etc.

Divulcan commented 1 month ago

So it would be great if you can provide feedback on how satisfied you are with the API of ScopeInfo, what additional features you need or would include, etc.

The main limitation is probably not being able to link any given node to a scope, however, with the current API I can make extension methods that cover everything that I need. Example of what the end-goal is: chrome_Zln5hJtBdW

Basically, handling cases where nodes are part of an outer scope and not the closest one (test expressions in control flow ops, EHs, etc).

If you think this might be a good fit for the extras package, I can make a proposal with better defined concepts once I have a clear vision of the extension features I plan on implementing.

adams85 commented 1 month ago

The main limitation is probably not being able to link any given node to a scope, however, with the current API I can make extension methods that cover everything that I need.

If you need to link any given node to a scope, you can do parserOptions.RecordScopeInfoInUserData().RecordParentNodeInUserData(), which will store the parent nodes in Node.UserData (or ScopeInfo.UserData when Node.UserData is used for storing a scope info object). Then, you can walk up the node tree until you find a node whose UserData is a ScopeInfo.

I know it's not the most streamlined dev. experience, however it seems we can't really do any better if we want to stick to the performance-oriented design guidelines outlined in the PR description.

If you think this might be a good fit for the extras package, I can make a proposal with better defined concepts once I have a clear vision of the extension features I plan on implementing.

Sure, now that the basics are more or less figured out, we can move on to polishing DX. My only request is to avoid adding features too specific to your use case. I'd like to keep this lib general, that is, only include features that most of the use cases are expected to benefit from.

adams85 commented 2 weeks ago

Public API's taking shape nicely, plus I did some tweaks and could even improve the max. recursion depth. Seems like we're also ok performance-wise:

Acornima.Benchmark.FileParsingBenchmark

Diff Method FileName Mean Error Allocated
Old AcornimaParse bundle 328.320 ms 5.1226 ms 75.52 MB
New 327.107 ms (0%) 4.6519 ms 75.52 MB (0%)
Old AcornimaParse jquery-1.9.1 8.097 ms 0.0784 ms 3.19 MB
New 7.999 ms (-1%) 0.0719 ms 3.19 MB (0%)
Old AcornimaParse yui-3.12.0 6.323 ms 0.0123 ms 2.55 MB
New 6.267 ms (-1%) 0.0251 ms 2.55 MB (0%)
adams85 commented 2 weeks ago

Uh-oh... we have a problem.

Let's consider we have a simple function declaration with default parameters:

function f(a = y) { var y = 2 }

For this script, acorn reports the following scope info:

Program (f)
 └─FunctionDeclaration (a, y)
    ├─Identifier
    ├─AssignmentPattern
    │  ├─Identifier
    │  └─Identifier
    └─BlockStatement
       └─VariableDeclaration
          └─VariableDeclarator
             ├─Identifier
             └─Literal

The problem here is that variable y should not be visible from the parameter list of the function. If you call function f, you get "ReferenceError: y is not defined".

This happens because acorn doesn't create separate scopes for the function parameter list and the function body but merges the declarations in the two into one scope. This makes the feature in its current form pretty useless for consumers (e.g. for my ES module bundler). It would only be useful if it produced this:

Program (f)
 └─FunctionDeclaration (a)
    ├─Identifier
    ├─AssignmentPattern
    │  ├─Identifier
    │  └─Identifier
    └─BlockStatement (y) - or (a, y) would probably be acceptable here too
       └─VariableDeclaration
          └─VariableDeclarator
             ├─Identifier
             └─Literal

I'll take a look to find out whether we could tweak the parser logic to output the correct scope info without major changes and a significant performance penalty. But there is a chance that this whole effort goes down the drain in the end...

adams85 commented 1 week ago

Good news, looks like the PR is saved as I could figure out an unobtrusive way to tackle the issues described in my previous comment.

OnNode still reports a single scope for function and catch clause nodes, but the parser now stores two numbers (VarParamCount, LexicalParamCount) into Scope. These numbers tell how many variables are parameters in the corresponding spans actually.

Thanks to this extra info, it became possible to build a scope tree with separate scopes for function/catch clause parameter lists and function/catch clause bodies. For function f(a = y) { var y = 2 }, the following tree is produced now:

Program* [f]
 └─FunctionDeclaration [a, f]
    ├─Identifier
    ├─AssignmentPattern
    │  ├─Identifier
    │  └─Identifier
    └─BlockStatement* [y]
       └─VariableDeclaration
          └─VariableDeclarator
             ├─Identifier
             └─Literal

I also made some further improvements:

The name of a function declaration or function expressions is available in the function's scope. E.g. an expression like (function f(g = f) { return g })() returns the function itself. So I altered the ScopeInfo tree builder logic to include the function name in the function's scope:

Program* []
 └─ExpressionStatement
    └─CallExpression
       └─FunctionExpression [f, g]
          ├─Identifier
          ├─AssignmentPattern
          │  ├─Identifier
          │  └─Identifier
          └─BlockStatement* []
             └─ReturnStatement
                └─Identifier

Similarly, the name of a class declaration or named class expressions is available in the class's scope. (Which means that my statement "Classes don't create variable scopes as you can't directly declare variables in the body of a class, just properties and methods." wasn't entirely true. The class name is a special variable in the class's scope.) E.g. an expression like (new class C extends function() { this.x = C.X } { static X = 1 }).x returns 1.

Program* []
 └─ExpressionStatement
    └─MemberExpression
       ├─NewExpression
       │  └─ClassExpression [C]
       │     ├─Identifier
       │     ├─FunctionExpression []
       │     │  └─BlockStatement* []
       │     │     └─ExpressionStatement
       │     │        └─AssignmentExpression
       │     │           ├─MemberExpression
       │     │           │  ├─ThisExpression
       │     │           │  └─Identifier
       │     │           └─MemberExpression
       │     │              ├─Identifier
       │     │              └─Identifier
       │     └─ClassBody
       │        └─PropertyDefinition
       │           ├─Identifier
       │           └─Literal
       └─Identifier

This was problematic though because the parser didn't create scopes for classes to do variable redeclaration detection as they're not needed for that purpose. So, I decided to bite the bullet and changed the parser to create scopes for classes too. (Hopefully won't affect perf. much but I'll need to do some benchmarks w.r.t. this.)

After making these changes, I also updated my bundler project to use the scope info from the parser to build its scope tree. I could make it work, what's more, this simplified the related logic pretty much. 🎉

So the feature seems to turn out useful in the end. So I plan to include the core functionality (the changes to the parser and OnNode) in the master branch, then rebase this PR so it will be about the extras part (ScopeInfo) later on.

adams85 commented 1 week ago

The core functionality (exposing the raw scope info via ParserOptions.OnNode) has been merged into master and shipped with v1.1.0.

So from now on, this PR is about the convenience features living in the Extras package (RecordScopeInfoInUserData, ScopeInfo, etc.), which should provide some handy APIs for consumers to deal with variable scopes. Stuff like what was proposed here.

The starting point is already there, including the logic that does the heavy-lifting. I'm aware of one thing that's still missing from it: in non-strict mode function declarations are also hoisted. Probably I'll implement this (will be tricky though), but apart from this, I don't have plans with this PR for now. I'd leave it up to the community to shape the implementation to their needs. So, @Divulcan, if you're still interested, please go for it.