antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.79k stars 3.25k forks source link

Allow converting a ParseTree to an AST #2428

Open NathanJAdams opened 5 years ago

NathanJAdams commented 5 years ago

What would be awesome and IMHO an obvious next step with ANTLR would be to give people the ability to convert from a generated ParseTree to an AST.

There are a number of ways this could be done eg:

  1. ANTLR could have a generic Node, Leaf, Branch classes the ParseTree information is mapped onto.
  2. ANTLR could create a "best guess" at what classes are needed.
  3. The user could create a bunch of minimal data classes and the converter maps the ParseTree information into the user classes.
  4. The user could inject options to control how the conversion process happens, for example, they could set flags to keep/remove information, set operators on how to combine different ParseTrees into a single AST node.
mike-lischke commented 5 years ago

This makes no sense, really. A parse tree is a superset of an AST. You have all the info, which is contained in an AST already in your parse tree. Only the access to the token info is a tad bit more complex (you have to get it from the context's start/stop nodes).

NathanJAdams commented 5 years ago

Try writing a compiler with just a parse tree and it'll make more sense

KvanTTT commented 5 years ago

You can manually convert the parse tree to AST with Visitor. BTW, Roslyn Compiler Platform uses full fidelity parse trees: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview#syntax-trees.

NathanJAdams commented 5 years ago

Roslyn isn't a compiler though, it's a set of tools for IDEs and code analysis. A parse tree is the right structure for that kind of job, and an AST is the right structure for a compiler.

Sure of course it's possible to do it manually, but if it's a common use case it would make sense to abstract it out and allow the user to avoid reinventing the wheel.

sharwell commented 5 years ago

Roslyn isn't a compiler though

It definitely is.

mykolav commented 4 years ago

@mike-lischke

This makes no sense, really.

Actually, this feature-request to simplify AST construction makes a ton of sense. Antlr v3 used to have built-in mechanisms supporting AST construction. Take a look at the Tree construction page from the docs of Antlr v3 for additional details.
Antlr v4 dropped the AST construction support mechanisms as, according to Terence Parr, supporting building compilers is explicitly a non-goal for Antlr v4

Because most ANTLR users don't build compilers, I decided to focus on the other applications for ANTLR v4: parsing and extracting information and then translations. For compilers, we need to convert everything into operations and operands--that means ASTs are easier.

Clearly, the above implies ASTs absolutely make sense for compiler construction and similar tasks, Antlr v4 just doesn't aim at supporting such tasks.

A parse tree is a superset of an AST.

Sorry, but this is plain wrong.
Sure, there can be individual cases where a parse tree is a superset of the corresponding AST. But in general, an AST at most forms an intersecting set with the corresponding ParseTree.
Eli Bendersky laid out the differences between ASTs and Parse trees in a very nice and concise way in his blogpost Abstract vs. Concrete Syntax Trees .

You have all the info, which is contained in an AST already in your parse tree.

Just like the source code the parse tree was built from contains all the same info and maybe more.
It seems a parse tree doesn't help to meaningfully reduce effort to build AST compared to just rolling out a hand-made recursive descent parser — I'll try to give an example illustrating this statement in another comment.
Basically, Antlr v4 is likely not the right tool for jobs that rely on having AST. This is fair enough, given ease of AST construction was never a goal of v4. It just seems the documentation could state this more prominently.

mykolav commented 4 years ago

@KvanTTT

TLDR: A Roslyn syntax tree is an AST it is not a parse tree.

You can manually convert the parse tree to AST with Visitor.

You sure can. Here is an arbitrary example of such a conversion: AstBuilder.java
(Please notice, I'm not affiliated with the project in any way) Given AstBuilder is about 2000 lines of code. One has to wonder if using Antlr is worth it in this case.
Is a hand-rolled recursive descent parser going to take that much more code and effort to implement?
Keep in mind that in case of a hand-made parser:

In other words, without supporting direct AST generation, it isn't obvious what benefits Antlr brings to a task of parsing a programming language.

BTW, Roslyn Compiler Platform uses full fidelity parse trees: https://github.com/dotnet/roslyn/wiki/Roslyn-Overview#syntax-trees.

Sorry, but calling a Roslyn syntax tree a parse tree is factually wrong.
At least, given the commonly accepted definition of a parse tree. Please take a look at Eli Bendersky's post Abstract vs. Concrete Syntax Trees for a clear and concise explanation of the AST vs ParseTree differences.

You're correct that Roslyn produces full fidelity trees. But they are full fidelity in the sense that its possible to do a round-trip from a Roslyn syntax tree to the original source code exactly as the code was spelled by the programmer.
Please take a look at the two screenshots below. One is a Roslyn syntax tree and the other is Antlr's ParseTree for the same C# code.
I believe, from the screenshots its obvious the Roslyn syntax tree is an AST. While Antlr's tree is a ParseTree. I guess, we can say a ParseTree is a full fidelity tree with respect to the grammar — i.e., a ParseTree contains a node for each non-terminal of the grammar.

class Program
{
    public static int Main(string[] args)
    {
        return 0;
    }
}

RoslynSyntaxTree

Antlr4ParseTree

sharwell commented 4 years ago

The real answer to this is https://github.com/antlr/antlr4/issues/2428#issuecomment-454751620. ANTLR 3 tried to make ASTs using the grammar, but they just ended up as partial-fidelity parse trees that took longer to create than full-fidelity ones. The whole feature was dropped from ANTLR 4 which made it simpler, faster, and more powerful. The tools already exist if a different particular representation is desired in an application.

mykolav commented 4 years ago

If anyone lands here in search of a way of converting an Antlr's parse tree into an AST,
I believe this StackOverflow answer does a stellar job explaining it. And here's a real-world example, AstBuilder.java

KvanTTT commented 4 years ago

ANTLR tree depends on grammar. Moreover, it's possible to skip a step of parse tree building and generate AST during the parsing process. In this case listeners with disabled option buildParseTrees = false should be used.

lwhite1 commented 3 years ago

First, thank you everyone for your work in developing and maintaining this fantastic library. I can completely understand the desire to remove features that weren't being widely used and require a great deal of work to maintain.

However, the suggestion in this thread that converting a parse tree to an AST tree is easy to do by hand is less easy to understand. If someone had the time and skill, this would be an awesome feature. Consider:

Parse trees are really easy to build by hand and are so regular that tools like ANTLR can automate the process for us. That’s the good news. The bad news is that parse trees are extremely inconvenient to walk and transform. Parse trees are full of noise because of all the interior rule nodes. They are also very sensitive to changes in the grammar.

Parr, Terence. Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers) (pp. 91-92). Pragmatic Bookshelf. Kindle Edition.

When I read in that book about the AST support in ANTLR 3, I was really impressed by the convenience it promised, and especially the potential for reducing sensitivity to grammar changes as the language is developed.

So, thank you again for your work, and thank you for leaving the issue open. I hope someone can get to it someday. Cheers.

alessiostalla commented 2 years ago

I agree that ANTLR should focus on one thing, parsing. Let's leave the AST to other projects, that may experiment with different approaches.

Shameless plug, you may be interested in the "*LaSu" methodology that we use, and the open-source support libraries that we've written for it, that provide the building blocks for an AST and some glue code to ease the mapping from an ANTLR parse tree: https://github.com/Strumenta/StarLasu

BubblyJuly commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

Korporal commented 1 year ago

This is an interesting post about this

https://tomassetti.me/migrating-from-antlr2-to-antlr4/

BubblyJuly commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

NilsBaumgartner1994 commented 1 year ago

Would like to just print out the modified tree so i can get the source code. I think generating an AST or having a minimal example on how to "print" the resulting code would be nice.

BubblyJuly commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

NilsBaumgartner1994 commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

Translated:

"This is a holiday auto-reply email from QQ mailbox.

Hello, I am on vacation recently and cannot reply to your email personally. I will reply to you as soon as possible after the vacation."

*Incomming auto-reply mail again ..."

neon12345 commented 10 months ago

I also need antlr to create an AST rather than a parse tree, so I have created two repositories working towards this. With some additional information, it should be possible to compress and modify the tree into an AST automatically. As stated, it is more efficient to create an AST instead of a parse tree if needed, but there are use cases where performance is not as important and a smaller AST is more manageable (e.g. AST to AST translation). The project is in an early stage: part1 part2

BubblyJuly commented 10 months ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

neon12345 commented 6 months ago

I made some progress in creating the tool to transform the antlr parse tree into an AST. As a result, I can now create a self-contained JavaScript version of the AST and also query/transform/print/visualize it with CSS selectors from JavaScript.

Demo