ceylon / ceylon-compiler

DEPRECATED
GNU General Public License v2.0
138 stars 36 forks source link

AST transformers #1865

Open gavinking opened 10 years ago

gavinking commented 10 years ago

Now that ceylon.ast is ready, I propose that we define a macro system of AST transformers for Ceylon. This would be the ability to write a Ceylon module that is called by the compiler at compile time, with an instance of the ceylon.ast AST, and has the ability to manipulate the AST. This is, it seems to me, the most natural and straightforward way to provide ceylon/ceylon-spec#906.

lucaswerkmeister commented 10 years ago

One important question to discuss is at which stage of compilation the macro execution happens – before or after typechecking?

In the current state of ceylon.ast, it would make the most sense to execute macros immediately after parsing, before typechecking, because the ceylon.ast ⇔ RedHat AST conversion discards any information except syntactical information – i. e., the typechecking results (model, unit, errors, etc.) would all be discarded. It’s possible to record them in ceylon.ast (ceylon/ceylon.ast#17), it’s just not done at the moment.

Also, @gavinking:

the ability to manipulate the AST.

ceylon.ast nodes are immutable; the macro would return a modified copy of the AST.

Also, why is this a ceylon-compiler issue? I don’t see why we couldn’t do this for the JS backend as well.

gavinking commented 10 years ago

@FroMage wants me to not call these things "macros".

So what should we call them?

lucaswerkmeister commented 10 years ago

On ceylon-users @renatoathaydes talked about “Groovy ASTTransformations”, and that seems like a pretty fitting term to me – it’s exactly what it does: transform an AST.

(Potential problem: How many people are familiar with the term “AST”?)

renatoathaydes commented 10 years ago

Here's a link to the Groovy doc about this: http://groovy.codehaus.org/Compile-time+Metaprogramming+-+AST+Transformations

Notice that Groovy ASTTransformations can be invoked on any compiling phase:

See http://groovy.codehaus.org/Compiler+Phase+Guide

gavinking commented 10 years ago

So, thinking this through a bit, my big concern about this feature is that in full generality it could really let you subvert the whole nature of Ceylon. For example, potentially you could:

You might think "well, sensible people wouldn't do that", but the truth is that sensible people will find a reason why they need to do all kinds of nasty things in the face of other design constraints, so I think we need to make sure that we place appropriate constraints on what an AST transformer can do.

In particular, I think it should not be able to take a fragment of code with typing errors and make that code well-typed. That is, the code must be syntactically legal and well-typed before the AST transformer even gets to have a go at it.

Indeed, I would probably even prevent the AST transformer from changing the type of any existing node! It could add nodes, and request the compiler to assign types to them, but it could not affect types that have already been assigned. A nice thing about this restriction is that it would mean that AST transformers have minimal impact on performance of the compiler, and would be more likely to compose without nasty collisions.

I think that even in the face of this restriction, we could provide the features we've identified as "legitimate" uses for this functionality:

gavinking commented 10 years ago

@renatoathaydes I definitely don't want to provide anything as powerful as what Groovy provides. I want this to be a very modest, limited facility which doesn't allow for people to go crazy and start changing the whole nature of the language and breaking the restrictions that are engineered into the language. In full generality, AST transformers have all kinds of problems: the don't compose well, they allow the introduction of nasty implicit behavior, they can impact the performance of the compiler.

lucaswerkmeister commented 10 years ago

Okay, with that in mind, my idea of what happens is:

As the very last phase of typechecking, right before the backend starts, the AST is converted to ceylon.ast. This conversion attaches

The AST transformer happens.

Then, a special subclass of RedHatTransformer is used to convert backwards (I need to make those methods default). For every node, it checks:

This ensures that no nodes were moved or removed. If that check is successful, the RedHat AST node is obtained:

Then, the typechecker runs again, checking only those nodes that have no model.

Note that a change will propagate upwards: If a statement is added to a function, then the body will be new, causing the function to be new, causing the compilation unit to be new. The typechecker will then revisit the compilation unit, function, and body (but using information on the old statements where it exists).

For a few node types we should probably relax those restrictions a bit – for example, allow positional arguments ⇒ named arguments?

lucaswerkmeister commented 10 years ago

Unrelated: I just noticed that within ceylon.ast, Transformer is a transformation to anything – for example, CeylonExpressionVisitor, used in string, is a Transformer<String> – while a transformation to another AST is called Editor. Should we call it “AST editors” instead?

gavinking commented 10 years ago

Should we call it “AST editors” instead?

No. :-)

luolong commented 10 years ago

One use case for AST transformations, that I would see as useful is in ceylon.test:

testSuite void allTests() {
    YodaTest();
    DarthVaderTest();
    starOfDeathTestSuite();
}

which would be transformed into equivalent of the following test code instead:

testSuite({`class YodaTest`,
           `class DarthVaderTest`,
           `function starOfDeathTestSuite`})
void starwarsTestSuite() {}   
akberc commented 10 years ago

Your concern is valid, it would provide an open door for subverting the language. However, how about subsetting?

I was imagining that one would be able to subset the language by validating the AST and throwing errors if the subsetting was not appropriate for the target.

Example: templating. Ceylon inheritance syntax is very intuitive and typesafe and can be used for CSS metamodel that compiles to CSS (like LESS), or even HTML. The hurdle is that advanced features of the language, other than inheritance and named arguments (and a few others like case/conditions/enumerations) have to be filtered out.

Perhaps, another usecase would be scripting and REPL - without compilation. A simplified subset of Ceylon would use variables and flow control, and no inheritance, in order to execute the AST nodes using pre-defined execution for commands (interpreter ?)