antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.98k stars 3.26k forks source link

Support tree rewriting #1732

Closed ghost closed 7 years ago

ghost commented 7 years ago

While the focus of Antlr4 is unfortunately no longer for compiler/interpreter/transpiler writers, I use it for a transpiler anyway. Antlr4 is so much nicer grammar-wise than Antlr3.

However, the lack of tree rewriting/mutation is driving me crazy. This is for special cases, of course, but when you do need it (say, for normalization), you end up doing ugly duplication on a side-AST.

All I really need is mutators on the generated vector/string fields (this is the C++ backend). I can't even call setText now.

Anyway. Frustrated.

Any plans for tree mutation?

mike-lischke commented 7 years ago

Have a look at #1647. About setText(): do you mean the the one for tokens? If so cast your token to a CommonToken, which supports it.

ghost commented 7 years ago

@mike-lischke I mean the generated context objects I get in visitor callbacks. RuleContext derivatives. They have getText, not setText.

@mike-lischke As for that issue link, it seems the idea is to have a separate tree writer visitor? That is really excessive in my case. I just want to change a few things during visitation. Fast'n' easy.

ghost commented 7 years ago

@mike-lischke Anyway, nice to see that work has begun. Do you plan to support this in your cpp target anytime soon?

mike-lischke commented 7 years ago

This is basically in the design phase currently. Implementation details are discussed and such. As soon as the dust has settled and the Java impl is done all runtime maintainers will go and port it to their runtimes.

ghost commented 7 years ago

I see. Thanks.

ghost commented 7 years ago

@mike-lischke I've been using a cpp target a lot by the way. Great quality.

mike-lischke commented 7 years ago

Glad to hear :-)

millergarym commented 7 years ago

@pureconfig are you building an AST, or as per antlr4 doing everything in the listeners or visitors?

Like you I really like the tree stuff. So much so that I effectively added tree grammars back to antlr4. The reason I say effectively is because it is in "user" space, no hack to antlr4. Basically the listener build an AST using a custom tree data structure that support the token stream interface. The tree is then used as the token source to a different (effectively tree walker) parser. A major difference between this tree structure and antlr's is that it is immutable and importantly the nodes and the tree topology are separate. This allow the nodes to live in multiple trees at the same time, making tree rewrite trivial, much easier the tree write in antlr3.

At the moment I'm prototyping, so not really concerned with performance. There are a number of reason this might not suite your purposes.

You can find a early version of this the url below. https://github.com/wxio/antlr4-go-examples/blob/master/eval/ExprWalker.g4 Should be pretty trivial to port to C++.

Let me know what you thing.

ghost commented 7 years ago

@millergarym My problem is that the input language is simple enough for me to use the generated Antlr context objects as the AST. But need some simple transformations, basically a reordering of a childnode in one case, and text replacement in another node.

Because I can't mutate it, I end up with my listener producing an AST (which mirrors the Antlr generated concrete syntax tree - the context objects), which I then have to visit again. Very frustrating.

millergarym commented 7 years ago

@pureconfig ahh, the third dot point

ghost commented 7 years ago

@millergarym Exactly :)

maybeec commented 7 years ago

@millergarym I was searching for tree walker parsers in antlr4. Thanks! Can you point me to the implementation? I was searching a little bit in your antlr4 fork, but could not directly find it while following the commits.

I would like to give it a try.

millergarym commented 7 years ago

@maybeec take a looking at the link in my earlier comment. There is no ast construction or tree walker in the antlr code, it is done in the client code. In this case Go.

As I mentioned the walker is a parser, will the source token coming from a depth first search of a tree (the ast). The ast is built by a listener to a prior parser. The ast is a custom Go tree, not one provided by antlr.

What is your target language?

maybeec commented 7 years ago

@millergarym ok thanks. Got it. I will have a closer look at it.

My target language is Java as well as the source language. I am looking for the best way of source code transformation with antlr4. There are tools like TXL or Stratego out there, but they do not provide updated Java grammars for real scenarios in practice. I am happy that I somehow can get the latest grammars for free in antlr4. So I want to see my oportunities to transform code using antlr4. Listeners are a little bit unhandy. I will give the tree walker concept a try.

parrt commented 7 years ago

Hi guys. i'm going to label as a dup of #1647.