antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.3k stars 3.3k forks source link

Go target - Interfaces instead of ptr structs - Listeners and Visitors #1843

Open millergarym opened 7 years ago

millergarym commented 7 years ago

Hey Antlr/Golang people, (@cyberfox, @pboyer, @parrt, @willfaught, @sharwell)

I'm posting this to propose and discussion a breaking change to Listener of the Go target. It seems opportune, based on @cyberfox's work I have a PR on a visitor implementation. Based on this I would like to propose changes to the Listener interfaces, using interfaces instead of pointer to structures.

eg

func (s *MyListener) EnterStart(ctx parser.IStartContext) {}

instead of

func (s *MyListener) EnterStart(ctx *parser.StartContext) {}

As noted on a number of occasions, the Antlr Go code is not idiomatic Go. The major manifestations of this is large verses small interfaces, and trying to fit Java/C# "OO" into Go. To explain the "OO" comment, Go isn't objected-oriented in the same sense as Java; no implementation inheritance, only interface sub-typing, interface sub-typing is 'structural' vs Java's 'nominal'. Using interfaces and specifically small interfaces are used instead of "OO".

A specific place the current Go runtime is a problem is in re-using Listeners and Visitors. Imagine you have two grammars (A and B) that are very similar. B is a copy of A with one added and one modified rule. Currently it is not possible to implement a BListener that delegates to AListener (without making new A-based structures for every call). Using interfaces the BListener just needs to do a type assertion (cast) in order to delegate to AListener methods.

One of the issues swapping to interface is that the current generated parser does not create interfaces for Alt contexts, only for rule contexts. Implementing Alt context interfaces required changes to StructDecl.java & ListenerFile.java.

The changes proposed are in https://github.com/wxio/antlr4/tree/visit-runtimeimport-tokenstream https://github.com/antlr/antlr4/pull/1841

An example using these can be seen at https://github.com/wxio/antlr4-go-examples/tree/interface-discussion/scratch note: the base listener and base visitor are now empty, only contains an example implementations in comments.

Currently the A - B use-case from above only works for named Alt, but it can be made to work for un-alt-named rules.

Cheers Gary

cyberfox commented 7 years ago

I have some concerns about the interface approach, mainly around the inability to implement only a subset of the visitor endpoints, for interpreter development.

I also think that the visitors returning interface{} is a very valuable aspect, and should be retained, as it simplifies the common case for expression interpretation.

The methods I'd added for ShouldVisitNextChild and AggregateResult were to provide feature parity with the Java code, as I can see those being useful in larger scale parsers. (Actually, I may use AggregateResult myself when handling comma-separated parameters to a method in the language I'm working on.) The type switch was a vastly better way of doing it than my cascading tests, though, thank you for that.

Isn't there a concern with the bParser.(A) cast approach that the aParser will call down into a child visitor method that bParser modified, but because Go doesn't support that, it'll call the A version instead of the B version? That was the problem that I attempted to work around with the SetSuper method (which is an awful name, I concede :) ).

millergarym commented 7 years ago

@cyberfox

I have some concerns about the interface approach, mainly around the inability to implement only a subset of the visitor endpoints, for interpreter development.

You absolutely only have to implement a subset. The generated parser code does a type check before calling the listener or visitor. In the case of the visitor, VisitChildren is called if there is not implementation. the var _ TListener = &MyListener{} is not mandatory and is only used if you want the compiler to check that an interface is implemented. With small interfaces you could have a whole lot of var _ XContextVisttor = &MyListener{} for type checking.

I also think that the visitors returning interface{} is a very valuable aspect, and should be retained, as it simplifies the common case for expression interpretation. ... ShouldVisitNextChild and AggregateResult ...

The two methods removed are DefaultResult and AggregateResult. ShouldVisitNextChild is still there.

Can you give an example of of how default and aggregate work. I couldn't see how that could be put together to do anything. It is far easier to carry state around in the visitor struct. The delegator (vs "super") has the advantage of being able to swap the visitor during the iterator. This can be seen in the A-B example. If you can provide and example using the returned interface{}, I can see if it is awkward with my code.

Isn't there a concern with the bParser.(A) cast approach that the aParser will call down into a child visitor method that bParser modified, but because Go doesn't support that, it'll call the A version instead of the B version?

This is why my VisitChildren has an extra argument delegator.

That was the problem that I attempted to work around with the SetSuper method (which is an awful name, I concede :)

Yes, in Java land your "super" is actually the "child" class (the this implicit receiver), not the super. Actually implicitReceiver would be a much better name than super. Also without setsuper your code wouldn't do what was expected.

Have a look at https://github.com/wxio/antlr4/blob/visit-runtimeimport-tokenstream/doc/go-target.md the example should make things clear.

millergarym commented 7 years ago

@cyberfox

Moving here from https://github.com/antlr/antlr4/pull/1841

What should the Go interfaces look like?

Goals:

The following discussion only relates to Visitors, but also applies to Listeners.

example grammar

start : a[false];

a[bool recursive] returns[Atom result]
  : b[recursive] { $result = ... }
  | ...
;

b[bool recursive] returns[Atom result] 
  : .. a[false] ...
  | c[true] c[false]*
;

c[bool First]
  : ...
;

Proposal 1 - Extra args

type ParseTree interface {
    SyntaxTree

    Accept(visitor ParseTreeVisitor, currentState interface{}) (result interface{})
    GetText() string

    ToStringTree([]string, Recognizer) string
}

type ParseTreeVisitor interface {
    VisitNext(next Tree, currentState interface{}) bool
    VisitTerminal(node TerminalNode, currentState interface{})  (result interface{})
    VisitErrorNode(node ErrorNode, currentState interface{})  (result interface{})
    VisitChildren(node RuleNode, delegate ParseTreeVisitor, currentState interface{})  (result interface{})
}

Proposal 2 - State in all tree nodes

type Tree interface {
    GetParent() Tree
    SetParent(Tree)
    GetPayload() interface{}
    GetChild(i int) Tree
    GetChildCount() int
    GetChildren() []Tree

       SetState(interface{})
       GetState()interface{}
}

type ParseTree interface {
    SyntaxTree

    Accept(visitor ParseTreeVisitor) (result interface{})
    GetText() string

    ToStringTree([]string, Recognizer) string
}

type ParseTreeVisitor interface {
    VisitNext(next Tree) bool
    VisitTerminal(node TerminalNode)  (result interface{})
    VisitErrorNode(node ErrorNode)  (result interface{})
    VisitChildren(node RuleNode, delegate ParseTreeVisitor)  (result interface{})
}

Proposal 3 - functional / var-args

Something more from functional programming? But what? Maybe a hint can come from ... https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis https://commandcenter.blogspot.com.au/2014/01/self-referential-functions-and-design.html

Proposal 4 - Tree and Node

type Tree interface { .. }
type Node interface { 
       SetState(interface{})
       GetState()interface{}
}
type TreeNode interface {
   Tree
   Node
}
type ParseTree interface {
    SyntaxTree
        Node

    Accept(visitor ParseTreeVisitor) interface{}
    GetText() string

    ToStringTree([]string, Recognizer) string
}
type ParseTreeVisitor interface {
        Node

    VisitNext(next TreeNode) bool
    VisitTerminal(node TerminalNode) interface{}
    VisitErrorNode(node ErrorNode) interface{}
    VisitChildren(node RuleNode, delegate ParseTreeVisitor) interface{}
}

From PR https://github.com/antlr/antlr4/pull/1807

// A complete Visitor for a parse tree produced by <file.parserName>.
BaseVisitorFile(file, header, namedActions) ::= <<
<fileHeader(file.grammarFileName, file.ANTLRVersion)>

<if(file.genPackage)>
package <file.genPackage> // <file.grammarName>
<else>
package parser // <file.grammarName>
<endif>

import "github.com/antlr/antlr4/runtime/Go/antlr"

type Base<file.grammarName>Visitor struct {
    *antlr.BaseParseTreeVisitor
    super <file.grammarName>Visitor
}

func (v *Base<file.grammarName>Visitor) SetSuper(newSuper <file.grammarName>Visitor) {
    v.super = newSuper
}

func (v *Base<file.grammarName>Visitor) DefaultResult() interface{} {
    return nil
}

func (v *Base<file.grammarName>Visitor) ShouldVisitNextChild(node antlr.RuleNode, resultSoFar interface{}) bool {
    return true
}

func (v *Base<file.grammarName>Visitor) AggregateResult(resultSoFar, childResult interface{}) interface{} {
    return childResult
}

func (v *Base<file.grammarName>Visitor) VisitChildren(node antlr.RuleNode) interface{} {
type Tree interface {
    GetParent() Tree
    SetParent(Tree)
    GetPayload() interface{}
    GetChild(i int) Tree
    GetChildCount() int
    GetChildren() []Tree
}

type SyntaxTree interface {
    Tree

    GetSourceInterval() *Interval
}

type ParseTree interface {
    SyntaxTree

    Accept(Visitor ParseTreeVisitor) interface{}
    GetText() string

    ToStringTree([]string, Recognizer) string
}

type RuleNode interface {
    ParseTree

    GetRuleContext() RuleContext
    GetBaseRuleContext() *BaseRuleContext
}

type TerminalNode interface {
    ParseTree

    GetSymbol() Token
}

type ErrorNode interface {
    TerminalNode

    errorNode()
}

type ParseTreeVisitor interface {
    Visit(tree ParseTree) interface{}
    VisitChildren(node RuleNode) interface{}
    VisitTerminal(node TerminalNode) interface{}
    VisitErrorNode(node ErrorNode) interface{}
}

type ParseTreeListener interface {
    VisitTerminal(node TerminalNode)
    VisitErrorNode(node ErrorNode)
    EnterEveryRule(ctx ParserRuleContext)
    ExitEveryRule(ctx ParserRuleContext)
}
willfaught commented 7 years ago

One thing to keep in mind is that some users have experienced bad performance. Since this would use an interface where before it was just a pointer, it'd be great if we could verify with some benchmarks that there's no performance regression.

cyberfox commented 7 years ago

I've posted up the Expressions repo which gives an example of the use of partial implementation and return values, as well as a language-independent grammar file, and a sample (trivial) input.

I extracted it from working code, albeit with a few small tweaks. The original's evaluator has a resolver for variable lookup as well, but I didn't find it necessary for the example. I can add it back in as an example of (immutable during evaluation, in my case) data associated with the ExprVisitor if it's helpful.

(It's has the parser/lexer/visitor checked in, and is still using the awful type checking cascading ifs, but that's because I haven't applied the type switch to my repo yet.)

millergarym commented 7 years ago

@willfaught me experiences optimising the Go code, specifically the antlr target indicates that interfaces really aren't a problem. If anything they are a solution, as interfaces are polymorphic, but structs aren't.

In the case of the Antlr target, the 12x improvement with the lexer and parser basically falls into two categories. 6x was a Java static being coded as a Go field instead of a package level var, there by undoing all atn caching. 2x was unnecessary memory allocation in the hashing code.

millergarym commented 7 years ago

@cyberfox thanks for the example.

I need to take back what I said about calling Accept inside a visitor. I know GOF names the method Accept, but we we call it Visit? It make more sense to me (and is closer to the Java code). By the way the Smalltalk design patterns companion names it acceptVisit. I can see how mixing Visit and VisitChildren within the same visit method might even be useful for context sensitive visitations.

I would still like to be able to handle composing visitors. My A-B example is a simple case of composition. This is the one case the delegate implementation can handle that your "super" one can't.

I'll try find some time on the weekend to make some changes.

millergarym commented 7 years ago

@cyberfox take a look at https://github.com/millergarym/expressions (and related https://github.com/wxio/antlr4-go/ and https://github.com/wxio/antlr4)

Cheers Gary

davesisson commented 7 years ago

What are delegate and args for? Can we get away without them (by stuffing delegate into BaseParseTreeVisitor and dropping args)? That'd make the code look a lot more like the Java version.

On Mon, May 1, 2017 at 1:31 AM Gary Miller notifications@github.com wrote:

@cyberfox https://github.com/cyberfox take a look at https://github.com/millergarym/expressions (and related https://github.com/wxio/antlr4-go/ and https://github.com/wxio/antlr4)

Cheers Gary

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1843#issuecomment-298304878, or mute the thread https://github.com/notifications/unsubscribe-auth/ALUC_QyoV01SVoZzpYLZhnqrBR449iDXks5r1ZhbgaJpZM4NJzBt .

millergarym commented 7 years ago

There is simple iteration, and listener are generally fine for this. When it comes to complex iteration this is where visitors coming into play. The obvious reason for using a visitor is to control what gets visited. However there are inherently complex use-cases that visitors can help address with features like argument passing, return values and composing visitor. Visitors could even be generalised to support object algebras (see Extensibility for the Masses - aka Monads in OO).

Yes technically, the delegate can be dropped from the signature and placed in the the base, and the args can be replaced with a method. But this would significantly complicate complex visitations. It is far easier to maintain state in the call stack than external to it.

As for that Java version. It is quite feature poor, and imo not a particularly good exemplar of what a visitor api/implementation should look like. As a design option, there could be two visitor interfaces stateless/state-full and generate code to support both.

Currently moving some of my interpreters to use visitors, happy to share some grammars if you're interested.

davesisson commented 7 years ago

I've got a Java-based grammar that uses visitors that I'm intending to rapidly switch over to Go once the visitor pattern is finalized (Java runtime startup cost is the reason). It's currently using the recommended side store (basically a hash table for references) for storing context information about the tree (basically a symbol table). I do have a few warts where I'm manhandling the visit behavior -- mostly to deal with complicated subtree behavior (and I have no great solution for tail recursion like A+A+A+A+A+A... a thousand times).

Let's say it's needed for now (I'm not sure how yet). Why do we need to pass variable args in? Would a struct be sufficient enough? If it's just a struct I could argue putting it back into the visitor (along with the delegate) except for your stack argument -- I guess I need an example.

I'm not against the variable arguments I'm just trying to see if we can reduce the boilerplate. I've got dozens of rules that all require the same visit(Context ctx) in Java and that's already annoying to type so I'd like to avoid two more arguments if it can be done elsewhere. I'd prefer to not have two kinds of visitor patterns so if end up thinking the variable args approach is better I'd rather go with that.

On Mon, May 1, 2017 at 5:03 PM Gary Miller notifications@github.com wrote:

There is simple iteration, and listener are generally fine for this. When it comes to complex iteration this is where visitors coming into play. The obvious reason for using a visitor is to control what gets visited. However there are inherently complex use-cases that visitors can help address with features like argument passing, return values and composing visitor. Visitors could even be generalised to support object algebras (see Extensibility for the Masses https://www.cs.utexas.edu/%7Ewcook/Drafts/2012/ecoop2012.pdf - aka Monads in OO).

Yes technically, the delegate can be dropped from the signature and placed in the the base, and the args can be replaced with a method. But this would significantly complicate complex visitations. It is far easier to maintain state in the call stack than external to it.

As for that Java version. It is quite feature poor, and imo not a particularly good exemplar of what a visitor api/implementation should look like. As a design option, there could be two visitor interfaces stateless/state-full and generate code to support both.

Currently moving some of my interpreters to use visitors, happy to share some grammars if you're interested.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1843#issuecomment-298464277, or mute the thread https://github.com/notifications/unsubscribe-auth/ALUC_RVUDNZcqVNaDw5aKAqX2vAWX3k_ks5r1nLOgaJpZM4NJzBt .

millergarym commented 7 years ago

Java and that's already annoying

I generate commented out signatures in the base visitor file and just copy paste.

Why do we need to pass variable args in? Would a struct be sufficient enough?

Can you give an example, I can't see it. The var args is so that manual Visit called doesn't need a nil.

storing context information about the tree (basically a symbol table)

A scoped symbol table requires a stack. This is trivial via args. Managing the stack external to the call stack is just a pain and unnecessary.

I have no great solution for tail recursion like A+A+A+A+A+A... a thousand times

I haven't thought it, but could simple continuation passing could be implemented in a VisitChildren or manually in a visitX method?

I've been playing with the interfaces and implementation. What do you think about the version below? I honestly think the delegate in every visit signature is preferable to putting it in the BaseParseTreeVisitor. It makes setup so much easier.

Note the current BaseParseTreeVisitor methods don't need a default receiver, therefore a nil one will do. No NewMyV() needed.

type ParseTree interface {
    SyntaxTree

    Visit(visitor ParseTreeVisitor, args ...interface{}) interface{}
    GetText() string

    ToStringTree([]string, Recognizer) string
}

type VisitNextCheck interface {
    VisitNext(next Tree, currentResult interface{}) bool
}
type VisitRestCheck interface {
    VisitRest(next RuleNode, currentResult interface{}) bool
}
type TerminalVisitor interface {
    VisitTerminal(node TerminalNode)
}
type ErrorNodeVisitor interface {
    VisitErrorNode(node ErrorNode)
}
type EnterEveryRuleVisitor interface {
    EnterEveryRule(ctx RuleNode)
}
type ExitEveryRuleVisitor interface {
    ExitEveryRule(ctx RuleNode)
}
type ParseTreeVisitor interface {
    VisitTerminal(node TerminalNode)
    VisitErrorNode(node ErrorNode)
    VisitChildren(node RuleNode, delegate ParseTreeVisitor, args ...interface{}) (result interface{})
}
type AggregateResultVisitor interface {
    AggregateResult(aggregate, nextResult interface{}) (result interface{})
}

// grammar Example;
// a : b ;
// b : 'c' # Y;

// in somefile.go
//go:generate java -jar path/antlr.jar -o parser -package parser -visitor Example.g4

// -- generate example_visitor.go
//
// type AContextVisitor interface {
//  VisitA(ctx IAContext, delegate antlr.ParseTreeVisitor, args ...interface{}) (result interface{})
// }
// type YContextVisitor interface {
//  VisitY(ctx IYContext, delegate antlr.ParseTreeVisitor, args ...interface{}) (result interface{})
// }

// -- implemented visitor
// type MyV struct {
//  *antlr.BaseParseTreeVisitor
//}
// func (v *MyV) VisitA(ctx parser.IAContext, delegate antlr.ParseTreeVisitor, args ...interface{}) (result interface{}) {
//  return
// }
// func (v *MyV) VisitY(ctx parser.IYContext, delegate antlr.ParseTreeVisitor, args ...interface{}) (result interface{}) {
//  return
// }

type BaseParseTreeVisitor struct{}

func (*BaseParseTreeVisitor) VisitTerminal(node TerminalNode) {}
func (*BaseParseTreeVisitor) VisitErrorNode(node ErrorNode)   {}

func (*BaseParseTreeVisitor) VisitChildren(node RuleNode, delegate ParseTreeVisitor, args ...interface{}) interface{} {
    next, isNextCk := delegate.(VisitNextCheck)
    rest, isRestCk := delegate.(VisitRestCheck)
    entryV, isEnterV := delegate.(EnterEveryRuleVisitor)
    exitV, isExitEV := delegate.(ExitEveryRuleVisitor)
    aggre, isAggre := delegate.(AggregateResultVisitor)
    var result interface{}
    for _, child := range node.GetChildren() {
        if isNextCk && !next.VisitNext(child, result) {
            continue
        }
        switch child := child.(type) {
        case TerminalNode:
            delegate.VisitTerminal(child)
        case ErrorNode:
            delegate.VisitErrorNode(child)
        case RuleNode:
            if isRestCk && !rest.VisitRest(child, result) {
                break
            }
            if isEnterV {
                entryV.EnterEveryRule(child)
            }
            r := child.Visit(delegate, args)
            if isExitEV {
                exitV.ExitEveryRule(child)
            }
            if isAggre {
                result = aggre.AggregateResult(result, r)
            } else {
                result = r
            }
        default:
            // can this happen??
        }
    }
    return result
}
davesisson commented 7 years ago

I think I'm okay with this particular implementation. The implemented visitor example feels a lot like the Java implementation of visitors so that's a plus.