LanguageDev / Yoakke

A collection of libraries for implementing compilers in .NET.
https://languagedev.github.io/Yoakke/
Apache License 2.0
141 stars 8 forks source link

Separating visitation logic from the performed action on the node #76

Open LPeter1997 opened 3 years ago

LPeter1997 commented 3 years ago

Is your feature request related to a problem? Please describe. I'm trying to formulate in my head a feature that would make it implicit to visit sub-nodes in the visitor. For example, let's say I'm writing a symbolic resolver, I'd need 2 phases:

Right now, I could write something like so:

[SyntaxTreeVisitor(typeof(Ast), MethodName = "Phase1")]
[SyntaxTreeVisitor(typeof(Ast), MethodName = "Phase2")]
public partial class SymbolResolution
{
    public SymbolTable Resolve(Ast program)
    {
        this.Phase1(program);
        this.Phase2(program);
        return /* ... */;
    }

    // Function definitions are order-independent, define them
    protected void Phase1(FunctionDef def) 
    {
        // TODO: Define function symbol

        // We need to manually visit every subnode of FunctionDef here
        foreach (var param in def.Parameters) this.Phase1(param);
        this.Phase1(def.ReturnType);
        this.Phase1(def.Body);
    }

    // Variable definitions are order dependent
    protected void Phase2(VariableDef def)
    {
        // TODO: Define variable symbol

        // We need to manually visit every subnode of VariableDef here
        this.Phase2(def.Type);
        this.Phase2(def.Value);
    }

    // Identifiers need a symbol in phase 2
    protected void Phase2(Identifier ident)
    {
        // TODO: Resolve a symbol for the identifier
        // Identifier has no subnodes fortunately here
    }
}

The thing is, we duplicated the default visitation behavior 2 times (once in Phase1(FunctionDef def) and once in Phase2(VariableDef def))! And this could be worse for more complex trees. Currently, we could solve this by having a separate visitor base-class and just call base.PhaseX(y) at the appropriate places. I have some concerns with this (with various degree of importance):

Describe the solution you'd like I'd somehow want to annotate, that I need the default behavior injected, even when I don't have a base class.

Describe alternatives you've considered I have thought a few alternatives.

  1. Generate a base implicitly, if there isn't one. This only partially solves the problem. It will make it possible to fall back to the default behavior, but the call would still be explicit.

  2. Generate a [MethodName]_Fallback for all methods with the default behavior. This slightly improves on the previous alternative by eliminating the need of base-classes at all. The explicitness still remains.

  3. Tell in [SyntaxTreeVisitor(...)], that the default behavior should be called. We could even make multiple options, like VisitChildren.Before, VisitChildren.After and VisitChildren.Explicit that would annotate when to visit children (before the custom visitation, after it, or the user will call it explicitly). These could be overriden case-by-case, just like the ReturnType property. This would eliminate all explicitness. Note, that implementing this might be hard, error-prone, or straight up impossible. (see next section)

  4. Have an option to generate method calls to simple leaf functions, that are called inside the default-generated visitor methods. Ordering could be maintained with the above idea. An example usage would be:

    [SyntaxTreeVisitor(typeof(Ast), MethodName = "Phase1Visit", LeafMethodName = "Phase1")]
    [SyntaxTreeVisitor(typeof(Ast), MethodName = "Phase2Visit", LeafMethodName = "Phase2")]
    public partial class SymbolResolution
    {
    protected void Phase1(FunctionDef def) { /* TODO: Define the symbol */ }
    protected void Phase2(VariableDef def) { /* TODO: Define the symbol */ }
    protected void Phase2(Identifier ident) { /* TODO: Resolve the symbol */ }
    
    // The generated Phase1Visit and Phase2Visit methods call these Phase1 and Phase2 methods
    }

    For custom positioning of the leaf-call, we could still override the visitor methods themselves. For the trivial beginning and end cases, the idea from alternative 3 could be used.

Problem with the third alternative While the third alternative - at least to me - looks like the neatest and most comfortable option, I am not yet sure, if it's implementable. We would need to essentially inject an extra call in the user-defined functions, but Source Generators can't modify existing code. I have not yet found any sensible way to do this yet.

A short summary I feel like what I'm trying to do is trying to factor out the structural walk from the action we want to perform on the node. There might be a better way to do this, I might have just not figured it out yet. Many times I need to walk the tree, but I don't want to write the tree-walking part, as that seems like another responsibility, that does not belong to the thing I'm doing.

skneko commented 3 years ago

I am not entirely convinced by this, and this is because implicitly traversing the children only works well in some situations, but not all of them. This means the user needs to be capable to choose when it wants this behavior with enough expressivity to be able to express any visitor without more limitations than it would have while using plain C# (ideally).

To be more specific, two important and common situations make it undesirable for the user to desire the visitor to implicitly visit the children of the visited nodes, or make the implementation impossible if we consider the desired expressivity: 1) The visitor methods do not return void. ln this case, the user will want to write an expression that gathers the visit values (why specificy a return value otherwise?), and this necessarily implies making the calls inside the function controlled by the user, as part of the behavior. Example:

public partial class PrettyPrinterVisitor {
  //...
  public string Visit(Expression.Binary expr) => $"(binary: {Visit(expr.Left)} {expr.Op} {Visit(expr.Right)})";
}

Here, traversing implicitly after (or before) this function is not adequate: the user already indicates what children to visit and how, in a natural way that allows for efficient combination of the results.

2) I feel like in many passes the user will be searching for some subtrees in particular. If we're in a function node, maybe we only need the name and then we want to bail. In how many cases does the user wish to always visit all the children of every node visited?

Furthermore, we risk introducing silent performance degradations, as if the user naturally and intuitively specificies they desire for the children to be visited (as is natural in a visitor and expected in a recursive function) some children (or all of them) would be visited twice.

So, one thing clear, we need the user to specific explicitly they want us to visit the children of a node for them, like this:

    // Function definitions are order-independent, define them
    protected void Phase1(FunctionDef def) 
    {
        // TODO: Define function symbol

        // We need to manually visit every subnode of FunctionDef here
        VisitChildren();
    }

This need already solves the order and other details, so no need for VisitChildren.After or anything similar. I feel like this is overengineered. In my opinion, the only alternatives that don't harm expressivity and are intuitive are the following ones:

  1. Base class: intuitive, but as you said, it introduces a super-class. I don't think it is that horrible, personally. It could be stated in the docs that you can optionally use an empty abstract base class to preserve the default functions alongside your custom ones, with the option to call the default from your one, like this:

    protected void Phase1(FunctionDef def) 
    {
        // TODO: Define function symbol
    
        base.Visit(def); // Very intuitive: delegate behavior to `base` = follow the default behavior = visit the tree for me
    }

    This has the advantage of using the same implementation rules (implement the ones not written already), as the abstract base class is empty.

Question: what if the visit methods do not return void? What does base.Visit(...) return then?

  1. "Fallback" method: leaving aside the fact I don't really like the "Fallback" name (maybe DefaultBehavior is better?), this introduces the problem that you will be always generating a copy of all method even if it is not necessary, unless you add an annotation (GenerateDefaultBehaviorMethodsAttribute?) to control this, introducing even more complexity. But you would avoid inheritance, if that is important.

In summary, I like the base class alternative as it adapts elegantly into the streamlined generation rules we already have: the methods not written are provided. A mechanism of the language itself, which is already implemented in CSC for us, allows the user to delegate control to the default behavior taking care of all the details, such as order and expressivity. However, this proposed solution is only well defined for the case where the return type is void.