Modifying the AST with visitors

lucaswerkmeister commented 10 years ago

In #2, we decided to make the AST immutable. However, we severely limit the module’s usefulness if we don’t provide some way to obtain a modified copy of an AST.

At the moment, AST nodes have a copy method, whose parameters are the same as those of the class itself and default to these values, allowing you to change only a single parameter in the copy without having to specify the others. However, this only allows you to change a single node; if you want to modify a node buried deeply in the AST, you have to populate the copy through copies of all its parent nodes. Writing this by hand isn’t feasible.

I currently see four approaches to mitigate this through the Visitor pattern.

The basic idea

The Visitor methods return a modified copy of the AST node. The question is now how they get (modified) copies of the child nodes.

Approach 1: A `ModifyingVisitor` utility class

This requires no modification in the node classes. The ModifyingVisitor declares visitX methods that return X. Instead of calling x.visit(this), you always call visitX(x). The visitX methods for abstract Xs must replicate the subtyping behavior: switch (x) case (is X1) { visitX1(x) } ...

Approach 2: A separate visiting mechanism

In addition to the Visitor class, there would be an independent ModifyingVisitor class. In addition to Node’s visit and visitChildren methods, there would also be modify and modifyChildren methods, each taking a ModifyingVisitor and declaring the node type as return type. modify calls the ModifyingVisitor’s appropriate modifyX method and returns the result of that (a modified copy of itself). modifyChildren calls copy with modified children.

shared Body modify(ModifyingVisitor visitor)
        => visitor.modifyBody(this);
shared Body modifyChildren(ModifyingVisitor visitor)
        => copy { statements = statements.collect((Statement s) => s.modify(visitor)); }

Approach 3: Like 2, but unify the two visiting mechanisms

shared Body visit(Visitor|ModifyingVisitor visitor) {
    switch (visitor)
    case (is Visitor) {
        visitor.visitBody(this);
        return this;
    }
    case (is ModifyingVisitor) {
        return visitor.visitBody(this);
    }
}

Not sure what that’s good for.

Approach 4: Like 3, but with typing wizardry

abstract class Visitor<Return>()
        given Return of Anything|Nothing {
    shared formal Body|Return visitBody(Body that);
}
class Body() {
    shared Body|Return visit<Return>(Visitor<Return> visitor)
        => visitor.visitBody(this);
}

If Return is Anything, the return type of all involved methods is Anything – they are void methods. If Return is Nothing, the return type of all involved methods is the correct node type, and the AST is modified.

Note that these ideas evolved over some time, and some are definitely more evolved than others. My favorite is definitely Approach 2. I don’t really see Approach 4 happening – the Return is a neat idea, but as void methods can’t override Anything methods, it would be annoying to implement (even non-modifying visitors would need to return null;). Approach 3 is really just a variant of Approach 2 – I’m throwing it out there, but I’m not saying it has any advantages (except that the return this might come in handy sometimes for chained invocations: visit(firstVisitor).visit(secondVisitor).visit(thirdVisitor)). And Approach 1 is the oldest, simplest, and stupidest one, and I’m only including it for completeness.

What do you think?

lucaswerkmeister commented 10 years ago

I’m actually warming to Approach 3, because chaining of visitors could really be useful if you do something complicated with the AST (and apparently, it can avoid memory leaks). On the other hand, the type check on each visit call costs something, and I could also enable chaining in Approach 2:

shared Body visit(Visitor visitor) {
    visitor.visitBody(this);
    return this;
}
shared Body modify(ModifyingVisitor visitor)
    => visitor.visitBody(this);

And chaining visitors as

myAst.visit(analyzer).modify(desugarer).visit(transformer);

might actually be more readable because it’s immediately clear which visitors modify the AST and which don’t.

luolong commented 10 years ago

Have you heard of the Lenses approach for updating immutable structures?

There's some discussion on various approaches to updating deeply nested immutable structures in this Stackoverflow answer: http://stackoverflow.com/questions/5767129/lenses-fclabels-data-accessor-which-library-for-structure-access-and-mutatio

luolong commented 10 years ago

Other than that ... after reading the above messages for about fifth time I managed to parse your proposal and I really like your last idea.

lucaswerkmeister commented 10 years ago

@luolong – I had only heard mentions of lenses in some ceylon-spec discussions, and more or less dismissed them as “expert speak” that I didn’t have a chance to understand anyways, which was obviously a mistake. Thank you especially for the first link – I found it a lot easier to understand than the other ones.

Lenser[Person].contact.email.user.set(person, "john")

seems much more readable and lightweight than creating an entire visitor.

I’m not sure what to do with this in ceylon.ast though. I think that something similar can be achieved with the ModifyingVisitor, and more importantly, that the ModifyingVisitor also gives you more control over which element of a collection you’re “viewing” (if you only want to modify one Statement of a Block).

Is there any existing implementation of lenses for Ceylon? I’m not even sure what the syntax would look like right now.

after reading the above messages for about fifth time

oh… was there anything in particular that made it so difficult to understand? Should I maybe have omitted some approaches?

luolong commented 10 years ago

oh… was there anything in particular that made it so difficult to understand? Should I maybe have omitted some approaches?

No. The first three times I was just way too distracted to really comprehend what I was reading... Fourth time I was only reading what I wanted to read. It was the fifth time around that I really put my mind into it.

luolong commented 10 years ago

Is there any existing implementation of lenses for Ceylon? I’m not even sure what the syntax would look like right now.

No, not that I know of.

lucaswerkmeister commented 10 years ago

If we go with the “modifying visitor” pattern, a question that still remains is the naming of the class and methods. Suggestions:

ModifyingVisitor, visitX
ModifyingVisitor, modifyX
Modifier, modifyX (no way, easily confused with VoidModifier, FunctionModifier etc.)
Mutator, mutateX (?)
…?

(Aside: I googled “modifying visitor” to see if some other good names would pop up, and this issue was the second result. The hell? IMO the approach doesn’t really sound unique at all.)

lucaswerkmeister commented 10 years ago

I prefer ModifyingVisitor.modifyX to ModifyingVisitor.visitX for the following reasons:

The difference between a ModifyingVisitor and a normal Visitor is that one modifies the AST while the other just visits it. The method names should reflect this.
It makes the code of all ModifyingVisitors more readable because you can immediately tell the difference between modifyX and visitX just by the name. (Weak argument: you could also tell the difference by the return type and the presence or absence of return statements.)
If we make Visitor and ModifyingVisitor interfaces instead of classes, this allows users to satisfy Visitor&ModifyingVisitor without naming conflicts. (I’m not sure how strong this argument is. I feel like it’s a strong one, but I can’t come up with any use for a Visitor&ModifyingVisitor right now.)

But I’m still not completely sure about the name “ModifyingVisitor”.

lucaswerkmeister commented 10 years ago

In #18 I noticed one thing that I really dislike about the name ModifyingVisitor: it makes the naming of all actual ModifyingVisitor classes or objects awkward. Are they called DoThisThingModifyingVisitor or DoThisThingVisitor? The first sounds too long, the second hides the fact that it’s a ModifyingVisitor, not a mere Visitor.

A one-word name like Mutator or Modifier would be much better in that regard.

lucaswerkmeister commented 10 years ago

Another suggestion: Editor. It’s a single word that sounds a lot better than Mutator.

ceylon / ceylon.ast