antlr / antlr5

BSD 3-Clause "New" or "Revised" License
50 stars 4 forks source link

Possible improvements for tool builders #29

Open neon12345 opened 7 months ago

neon12345 commented 7 months ago

I am the author of a tool for converting the antlr parse tree into a denser AST, which can output a standalone JavaScript version to query/transform/print/visualize using CSS selectors. (The end goal is to use machine learning with rule induction for AST transformation/printing.) Demo

Since there is the possibility of changing things with a new version, it would be nice to have a unique antlr state for as many positions in the grammar as possible. This does not currently apply to recursive rules.

ftomassetti commented 7 months ago

Hi @neon12345 I trying the demo but I get an error. Can you help me figuring out how to use the tool? I also created several libraries to generate an AST on top of ANTLR's parse tree. One we created for TypeScript is called Tylasu https://github.com/Strumenta/tylasu . Maybe there are some overlaps with you work.

I think getting suggestions for ANTLR 5 is very useful, especially when coming from tool builders like you.

Regarding your question, I am not sure I get what you mean with "unique antlr state for as many positions in the grammar as possible". Could you give me an example?

neon12345 commented 7 months ago

@ftomassetti

Your ISP needs to support IPv6. Other than that, when you press the Run button, the source code on the top left is sent to the server and the returned JavaScript AST is transformed with the script at the bottom into the top right result editor. There is also some help accessible with the bottom tab.

neon12345 commented 7 months ago

@ftomassetti

For recursive rules, the parentState is set for the context. What we need from the context and currently have to patch in, is the state before the precpred.

        _localctx = _tracker.createInstance<PatternContext>(parentContext, parentState);
        pushNewRecursionContext(_localctx, startState, RulePattern);
        setState(1913);

        if (!(precpred(_ctx, 2))) throw FailedPredicateException(this, "precpred(_ctx, 2)");
        setState(1914);
        match(Swift5Parser::AS);
ftomassetti commented 6 months ago

@neon12345 this is how I see the demo: https://www.loom.com/share/1523249bd9f2458785db29ef2c4ea421 I guess my ISP should support IPv6

neon12345 commented 6 months ago

@ftomassetti Can you test IPv6 compatibility with for example https://ipv6-test.com or https://test-ipv6.com/ to be sure? Also try ping antlr.syncsocial.de

kaby76 commented 6 months ago

For recursive rules, the parentState is set for the context. What we need from the context and currently have to patch in, is the state before the precpred.

Can't this be compute using "parentState" and the ATN containing that NFA state number? Given a parse tree, get "parentState" and "see if it is one of your recursive productions". Then, examine the ATN: follow the edges in the ATN back until you find a transition that involves "_p". Presumably, this would be a PrecedencePredicateTransition edge.

neon12345 commented 6 months ago

Can't this be compute using "parentState" and the ATN containing that NFA state number?

The way how we work with antlr is a visitor that gives the information about the position in the grammar with the antlr states. From a parsing perspective, it is best to get the information from the current context without many calculations. Thus from our perspective, it would be more valuable to give another state to the context. But I don't know other use cases and why the state had to be stored this way.

ftomassetti commented 6 months ago

@ftomassetti Can you test IPv6 compatibility with for example https://ipv6-test.com or https://test-ipv6.com/ to be sure? Also try ping antlr.syncsocial.de

Oh... surprisingly ipv6 does not work for me, so I am afraid I cannot watch the demo

kaby76 commented 6 months ago

it would be more valuable to give another state to the context. But I don't know other use cases and why the state had to be stored this way.

Considering the result of a parse is a parse tree, it seems fine to "hang" the "calling state of the ATN to another ATN" in the parent parse tree node. (It really should have been an edge ID because, possibly, one could have multiple edges with the same non-terminal symbol from a state.)

To avoid computing the precedence predicate state in a parse tree traversal, you could pre-compute prior to parse the precedence predicate state from the ATNs, and place them in a O(1) map.

So, for the java grammar, state "1372" would be the "parentState" for an ExpressionContext, but the computed precedence predicate state be "1370".

graphviz (12)

neon12345 commented 6 months ago

@kaby76 Thanks for the suggestions, I don't have a problem getting the values I need and it is fine for me to keep things like they are. I just wonder if the stored information is used as intended or could be improved, considering the state is also not consistently stored in the parse tree.