adams85 / acornima

Acornima is a standard-compliant JavaScript parser for .NET. It is a fork of Esprima.NET combined with the .NET port of the acornjs parser.
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

Is it possible to resolve members while visiting nodes? #12

Open Divulcan opened 1 month ago

Divulcan commented 1 month ago

Hello,

I was playing around with the library (it's great, thank you for sharing), and this caught my attention:

As the parser tracks variable scopes to detect variable redeclarations, it will be possible to expose this information to the consumer.

Given this, I am curious whether it would be reasonable to expose scopes on more nodes. I'm thinking of cases where multiple members can have the same identifier name and resolving the "target" needs to take into account the current scope.

Example:

var x = 5;
if (x == 2) {
    console.log('Hello world!')
} else {
    var x = 7;
    console.log(x)
}

Another example:

function foo() {
  console.log('Hello World 1');
}
function foo() {
  console.log('Hello World 2');
}
foo();

Another (slightly more cursed) example:

var x = "a";  
switch(x) {
  case "a":
  case 2:
    var x = 1;  
  case 3:
    console.log(x); 
    break;
}

SWC uses a numeric ctxt field to denote the scope a node belongs to, however, as far as I know, neither esprima nor acorn expose this information so... Is adding some sort of scope indicator something that is being considered or planned?

Thank you!

adams85 commented 1 month ago

Hi @Divulcan,

I'm glad that you find the lib useful!

Given this, I am curious whether it would be reasonable to expose scopes on more nodes.

Yep, as stated in the readme, I have plans to expose the variable scopes tracked by the parser.

For one, I want to do this to support my ES module bundler, as it now needs to build the variable scopes after parsing, in another pass - despite the parser has theoretically already collected all the necessary information once, during parsing. It would be great to eliminate this redundancy if possible.

A similar situation exists in Jint as that being a JS runtime needs variable scope information as well. I suspect that the scope info collected by the parser could be reused there too.

But TBH, I have no clue yet via what API and in what form this should be done. I need to do further research to figure out if it can actually be done and if so, how.

If you have something more concrete in your mind or if you could just share more details on your use case, that would probably help a lot with figuring this out.

SWC uses a numeric ctxt field to denote the scope a node belongs to

This is a nice tip. I don't know SWC but will definitely be useful to take a look at it for ideas.

Divulcan commented 1 month ago

For one, I want to do this to support my ES module bundler, as it now needs to build the variable scopes after parsing, in another pass - despite the parser has theoretically already collected all the necessary information once, during parsing. It would be great to eliminate this redundancy if possible.

I see, I agree with eliminating redundancy as it can become expensive once parsed apps are complex.

But TBH, I have no clue yet via what API and in what form this should be done. I need to do further research to figure out if it can actually be done and if so, how.

I have few ideas but, they are not fully polished, let me know your initial thoughts,

API Proposal

public interface INode
{
    // .. remains unchanged
    ScopeInfo? Scope { get; }
}

Alternatively, if we don't want to modify the INode structure, we could consider adding a "resolver" service, however, it would need to be able to track the scope given a SourceLocation, without digging in too much, I believe this would be more complex to design and implement.

public class ScopeInfo
{
    public ScopeType Type { get; set; }
    public List<Declaration> Declarations { get; set; } // Variable/class/function/import declarations defined in the scope
    public List<INode> References { get; set; } // Resolves the nodes referenced in the scope. This can be useful to track members that are declared in a parent scope
    public ScopeInfo ParentScope { get; set; }
    public List<ScopeInfo> ChildScopes { get; set; }
    public  object? UserData { get; set; } // Some sort of labeling/marking object, might be useful in some cases.
}

I added ChildScopes as it can be a useful feature when you need to traverse the scopes from top to bottom but, it also introduces additional complexity in managing changes within the AST and scopes. I couldn't tell the complexity or the potential challenges this might introduce in the future.

public enum ScopeType
{
    Global,    // The scope covering the entire JS file
    Function,  // Scope created by functions, including both function declarations and function expressions
    Block,     // Scope created by control flow operations, such as `if`, `for`, or `while` blocks (try/catch too, maybe?)
    Module,    // Scope specific to modules
}

With that information, I believe we could expose a good amount of useful methods.

public class ScopeInfo
{
   public INode? FindReference(string identifier) {}
   public bool RewriteReference(string identifier, INode node) {} 

  public Declaration? FindDeclaration(string identifier) {}
  public bool RewriteDeclaration(string identifier, Declaration declaration) {}

  public bool IsInScope(INode node) {}
  // The scope references/declarations can change when the AST is modified, we should be able to correct the scope information
  public void AdjustScope() {} 
  // Adjust the scope to implement the references/declarations declared in the statement
  public void AdjustScope(Statement injectedStatement) {} 
}

This is a nice tip. I don't know SWC but will definitely be useful to take a look at it for ideas.

It's super solid although can be somewhat overwhelming at first.

Babel has support for scopes as well, I believe one of the main challenges they had is keeping track of things after the AST is modified. I'm unsure if that's still the case, but it was an issue back in 2017ish.

Divulcan commented 1 month ago

I forgot one detail, we might have many results for FindDeclaration. The scope and variable is the same but it's declared twice, and depending on how we resolve the declaration, we can find different results.

var x = userInput();  
switch(x) {
  case "a":
  case 2:
    var x = 1;  
  case 3:
    console.log(x); 
    break;
}

In this case, x has two declarations on the same scope and, I believe it's impossible to track a correct declaration without analyzing the CFG of the script. While CFG analysis is interesting, I think we could return a list of declarations instead of assuming its going to be unique every time.

adams85 commented 1 month ago

Thanks for your thoughts!

I also did some experimentation on the topic in the meantime and came up with some ideas: https://github.com/adams85/acornima/pull/13

In the PR description I also wrote a few words about the design guidelines and requirements. This is (meant to be) a highly optimized parser, so there are some additional aspects to consider.

If you feel like it, check the PR out and share your opinion about it. I'd like to know if it would be ok for your use cases and if not, what should be changed.

Divulcan commented 1 month ago

Hey @adams85

I'm still playing with the PR, I was able to prototype a basic tracker with almost no modifications to your base. image

I suspect there might be some inaccuracies when applying it to complex applications, though I haven’t fully verified this yet.

I will try to find some time during this weekend to give few ideas that would align with the requirements that you mentioned on #13.

Slightly unrelated to that PR, If I manage to get some of these analyzers stable and you are okay with it, we could add them to the extras packages. Here are some of the concepts I’m considering:

With these analyzers representations beyond AST, such as CFG/SSA would be "easy" for any consumer to implement.