antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.55k stars 3.22k forks source link

[Go target] Antlr Visitor does not work properly #4398

Closed kaby76 closed 2 weeks ago

kaby76 commented 9 months ago

The Visitor tree walker for the Go target does not seem to work properly, at least how it works in comparison to all the other targets. This is a continuation of https://github.com/antlr/antlr4/issues/4395. I don't know why that was closed, the initial comment completely erased, and https://github.com/antlr/antlr4/issues/4404 opened. In any case, https://github.com/antlr/antlr4/issues/4398 is a detailed analysis on the Visitor code for the Go target.

Starting with the original grammar, I added debugging output to all visitor code "derived" in the parent struct of "BaseExprVisitor" (aka "Visitor"), the "BaseExprVisitor" struct, and the structs and functions in the Antlr4 Go runtime. The "workaround" (func (v *Visitor) Visit(t antlr.ParseTree) any ), which is claimed to fix the problem of getting func (v *Visitor) VisitStart(ctx *parser.StartContext) any to be called, does not really fix the issues with the Visitor code. The problem goes far beyond that. Replacing the VisitStart() function with a VisitExpr() or VisitDecl() function shows that the "work-around" does not work because VisitExpr() or VisitDecl() are not called. The default implementation generated by the Antlr tool does not actually traverse the parse tree! You can test this by simply adding debugging output to the generated "Visit" functions for Visit, BaseExprVisitor, and BaseParseTreeVisitor and see for yourself.

For an example that does work, see https://github.com/kaby76/antlr-4395. But, this implementation is not what I expected.

I am not clear what the Visitor code is supposed to look like. There is no example in the documentation.

VisitChildren() does not have an implementation

For all targets except Go, the base class code has an implementation that loops through all children of a node and calls Visit() on the child. There is no function like this anywhere in the Antlr Go runtime, nor is it generated by the tool. The Antlr tool generates code that contains VisitChildren() function calls.

func (v *BaseExprVisitor) VisitProg(ctx *ProgContext) interface{} {
    return v.VisitChildren(ctx)
}

For all targets, the code does exactly what is expected.

For Go, the only function that defines VisitChildren() is:

func (v *BaseParseTreeVisitor) VisitChildren(_ RuleNode) interface{}     { return nil }

I have to conclude that the Visitor code cannot work as is because there is no implementation for VisitChildren(). One must implement the function for Go.

Unclear how to define the derived Visitor struct--or interface

To use the code, all functions need to be defined for the composite visitor struct.

Is it this?

type Visitor struct {
    *parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.ExprVisitor
}

In any of these implementations, one needs to define every function associated with the visitor: Visit(), VisitChildren(), VisitProg(), VisitDecl(), VisitExpr(), VisitTerminal(), VisitErrorNode(). Otherwise, it either doesn't work or crashes.

In order to get the code to work, I essentially needed to reimplement the entire code from scratch.

If this is true, then this is a significant departure from how most people would understand how the code should work. Yes, Go does not support Inheritance. All the more, the documentation should provide an explicit, working example, complete with driver, grammar, visitor, go.mod, and makefile.

Personally, I don't use the Visitor code because semantic analysis is better served with the Listener code (for LR-attribute analysis), which people also claim to work.

jimidle commented 9 months ago

On vacation right now. I’ll look when I get back, but it works fine for me. Go may not always copy the other targets though

On Tue, Aug 29, 2023 at 03:15 Ken Domino @.***> wrote:

The Visitor tree walker for the Go target does not work properly. This is a continuation of #4395 https://github.com/antlr/antlr4/issues/4395.

  • VisitChildren() does not have an implementation. For all other targets, the "base class" code has an implementation that loops through all children of a node and calls Visit() on the child. There is no function like this anywhere in the Antlr Go runtime, nor is it generated by the tool. The Antlr tool generates code that contains VisitChildren() function calls.
  • The "base class" Visit() function is not called for a struct that contains the "base class" struct. The function with a "derived" receiver must be defined.
  • Unlike the othe targets, all rule Visit() functions need to be defined.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMDKAHU2X3RSTR46LFTXXTU6HANCNFSM6AAAAAA4B3MC34 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mkohlhaas commented 9 months ago

Is it this?

type Visitor struct {
    *parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.BaseExprVisitor
}

Or this?

type Visitor struct {
    parser.ExprVisitor
}

The last one is not in accordance to the Java implementation. The first one crashes.

Personally, I don't use the Visitor code because semantic analysis is better served with the Listener code (for LR-attribute analysis), which people also claim to work.

Most projects use the listener interface. But the visitor gives you more control because you explicitly have to - and want to, depending on your use case of course - call the appropriate visit methods.

Say e.g. you have the following grammar:

prog: (declaration| expression)+
...
<more stuff>
...

Then you can easily first go through all your declarations and afterwards your expressions.

mkohlhaas commented 9 months ago
  • CSharp
  • Cpp
  • Dart
  • Java
  • JavaScript/TypeScript
  • Python3
  • Swift
  • PHP

I guess all the other languages have proper class/object inheritance and method overloading. Golang has only embedding and interfaces. Makes it much more difficult to write a port from Java.

kaby76 commented 8 months ago

https://groups.google.com/g/antlr-discussion/c/gEiAPs-XGps

I cannot see how this person's code targeted for Go can work whatsoever. I have not replied there because the Google Antlr Discussion Group is only for announcements (but we haven't seen Antlr 4.13.1 announced either).

cwarden commented 6 months ago

@thesues provides a working example, (updated to antlr 4.13 in https://github.com/cwarden/antlr-calc-golang-example/blob/master/visitor/main.go) that looks like this:

type Visitor struct {
    parser.BaseCalcVisitor
}

func (v *Visitor) visitRule(node antlr.RuleNode) interface{} {
    node.Accept(v)
    return nil
}

func (v *Visitor) VisitStart(ctx *parser.StartContext) interface{} {
    return v.visitRule(ctx.Expression())
}
...
func main() {
    is := antlr.NewInputStream(input)

    // Create the Lexer
    lexer := parser.NewCalcLexer(is)
    tokens := antlr.NewCommonTokenStream(lexer, antlr.TokenDefaultChannel)

    // Create the Parser
    p := parser.NewCalcParser(tokens)

    v := &Visitor{}
    v.Start_().Accept(v)
}
kaby76 commented 6 months ago

@cwarden That example illustrates my point. Every single production in the grammar requires an implementation of the visit function. If you do not implement the VisitStart() function, then the expression nodes are never visited. For all other targets, that is not the case because there is an implementation for VisitChildren(). There is none in the Go runtime. For a toy example like Calc.g4, implementing a handful of extra visitor functions in the chain of notes from the root is not hard. But, for a large grammar with hundreds of rules, this becomes an issue.

mburbidg commented 4 months ago

One other thing. Error handling is very awkward! The visitor methods should all return (interface{}, error). This code is very Java-centric, and doesn't appear to be written with a deep understanding of Go.

jimidle commented 4 months ago

It wasn’t originally my code, but I assure you it works as I have used it in anger as well as the walker. I also assure you I’m that I understand Go, though the original author was very new to it.

On Wed, Jan 24, 2024 at 11:13 Michael Burbidge @.***> wrote:

One other thing. Error handling is very awkward! The visitor methods should all return (interface{}, error). This code is very Java-centric, and doesn't appear to be written with a deep understanding of Go.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4398#issuecomment-1907279356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMF3H3UNX2DTMK4RIL3YQB36ZAVCNFSM6AAAAAA4B3MC36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGI3TSMZVGY . You are receiving this because you commented.Message ID: @.***>

mkohlhaas commented 4 months ago

One other thing. Error handling is very awkward! The visitor methods should all return (interface{}, error). This code is very Java-centric, and doesn't appear to be written with a deep understanding of Go.

Please be constructive. A lot of code had to be written for this library. I appreciate the effort a lot!

mburbidg commented 4 months ago

I apologize, for my un-constructive comment.

I don't doubt that it works. And I appreciate the work that went into it. I think what is at issue is the WAY it works. Some very constructive suggestions have been made in this thread and some valid reasons as to why the way the code currently works is much less than ideal. Including the fact that the current implementation does not account for how error handling is done in Go. This applies to the listener as well as the visitor.

I would be willing to do some incremental work on this to improve it. It is unclear to me whether improvements would be considered or not. I don't want to do a lot of work, submit a PR and then have it rejected out of hand.

Is there a process for proposing a change, that is then discussed, and agreed to before work begins? I imagine that one of the issues is backwards compatibility. Is the runtime versioned, so that folks could adopt the changes as needed?

I would suggest making changes in stages. The first PR would be to change the return type of both the listener and visitor to include an error. The second would be to include default behavior to the visitor, similar to the default behavior present in some of the other implementations. Of course this cannot be done via inheritance, but there are other ways to do it in Go.

What do you think?

ycl2018 commented 3 weeks ago

@kaby76 hey! i fixed this problem in my repo , just replace the field of BaseParseTreeVisitor to interface antlr.ParseTreeVisitor in the generated file pie_base_visitor.go. I rename this file to pie_base_visitor_modify.go in my repo, and use my own BaseParseVisitor which implements antlr.ParseTreeVisitor!, it works well!

jimidle commented 3 weeks ago

You should not need to change the generated code. I have a dozen visitors. You need to embed the generated base visitor in your own struct and go from there. I still have not had time to make a demo of this.

Jim

On May 9, 2024, at 21:29, ycl2018 @.***> wrote:

hey! i fixed this problem in my repo https://github.com/ycl2018/pie-go , just replace the field of BaseParseTreeVisitor to interface antlr.ParseTreeVisitor in the generated file pie_base_visitor.go https://github.com/ycl2018/pie-go/blob/main/gen/pie_base_visitor_modify.go. I rename this file to pie_base_visitor_modify.go in my repo, and use my own BaseParseVisitor https://github.com/ycl2018/pie-go/blob/main/tree/basevisitor.go which implements antlr.ParseTreeVisitor!, it works well!

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4398#issuecomment-2103792323, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMDANQG4CBLM5HEA2MTZBQ5JRAVCNFSM6AAAAAA4B3MC36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBTG44TEMZSGM. You are receiving this because you commented.

ycl2018 commented 3 weeks ago

@jimidle Yes, I shouldn't have modified the generated code. But I can't complete my Visitor through other means. I think this should be a problem with the implementation of antlr4 go runtime. The generated BaseVisitor has a BaseParseTreeVisitor field embedded in it which is a struct rather than interface{}(what i do in my repo is to fix this), so any implementation that 'inherits' BaseVisitor cannot really override its VisitChildren() or other methods. and BaseParseTreeVisitor does not implement methods other than the Visit method, so it cannot be used at all.

As you mentioned, in the Java version implementation, the BaseVisitor inherits AbstractParseTreeVisitor, which has a complete implementation of VisitChildren and so on other methods. And users can implement custom methods by overriding its methods.

You should not need to change the generated code. I have a dozen visitors. You need to embed the generated base visitor in your own struct and go from there. I still have not had time to make a demo of this. Jim On May 9, 2024, at 21:29, ycl2018 @.***> wrote: hey! i fixed this problem in my repo https://github.com/ycl2018/pie-go , just replace the field of BaseParseTreeVisitor to interface antlr.ParseTreeVisitor in the generated file pie_base_visitor.go https://github.com/ycl2018/pie-go/blob/main/gen/pie_base_visitor_modify.go. I rename this file to pie_base_visitor_modify.go in my repo, and use my own BaseParseVisitor https://github.com/ycl2018/pie-go/blob/main/tree/basevisitor.go which implements antlr.ParseTreeVisitor!, it works well! — Reply to this email directly, view it on GitHub <#4398 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMDANQG4CBLM5HEA2MTZBQ5JRAVCNFSM6AAAAAA4B3MC36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBTG44TEMZSGM. You are receiving this because you commented.

jimidle commented 3 weeks ago

Go is not Java. There is no need for an interface. All you do is this (Listener here but it is the same):


// Listener is a listener that collects information about the program and performs a kind of
// lint analysis.
type Listener struct {
    parsing.BaseMVBListener. // Go is NOT Java. There is no inheritence
    errCount int
        // Whatever else you want in your Listener
}

func (s *Listener) HasErrors() bool {
    return s.errCount > 0
}

func (t *Listener) EnterEveryRule(ctx antlr.ParserRuleContext) {
    // fmt.Println("Entering rule", ctx.GetRuleIndex(), "at line", ctx.GetStart().GetLine(), "col", ctx.GetStart().GetColumn())
}

func (t *Listener) ExitEveryRule(ctx antlr.ParserRuleContext) {
    //fmt.Println("Exiting rule", ctx.GetRuleIndex(), "at line", ctx.GetStart().GetLine(), "col", ctx.GetStart().GetColumn())
}

func (t *Listener) ExitTunit(ctx *parsing.TunitContext) {
  // Overrides generated base func
  }

lexer.SetInputStream(t.InStream)
tokstream.SetTokenSource(t.Lexer)
parser.SetTokenStream(t.TokStream)
tree := parser.TranslationUnit()
// check errors
antlr.ParseTreeWalkerDefault.Walk(&Listener{}, tree)
jimidle commented 3 weeks ago

The embedded * of the abstract visitor is wrong, (shoudl not be a pointer) but for reasons that I cannot now remember I decided not to fix that. I think too much of the runtime expected it to be a pointer. I probably have a note somewhere telling me to fix it at some point. That was code I inherited.

But all you need do is embedded in your various vistors and start accept() ing.

// Code generated from MVB.g4 by ANTLR 4.12.0. DO NOT EDIT.

package parsing // MVB
import "github.com/antlr4-go/antlr/v4"

type BaseMVBVisitor struct {
    *antlr.BaseParseTreeVisitor
}

func (v *BaseMVBVisitor) VisitTranslationUnit(ctx *TranslationUnitContext) interface{} {
    return v.VisitChildren(ctx)
}
...

type MyVisitor struct {
  BaseMVBVisitor
}

func (v * MyVisitor) VisitTranslationUnit(ctx * TranslationUnitContext) someresult {
  x = ctx.something.accept(v)
  // do something
  // return the reult you want
}

And so on.
mburbidg commented 2 weeks ago

I would like to give this one more shot. I am building a grammar and front-end for a very large language. I want to use the Visitor pattern, not the Listener because of the flexibility it provides in controlling the visit path and error handling. Many of my Visit methods look as follows:

func (v *Visitor) VisitLinearCatalogModifyingStatementAlt(ctx *gen.LinearCatalogModifyingStatementAltContext) interface{} {
    return ctx.LinearCatalogModifyingStatement().Accept(v)
}

These nodes have a single child and essentially call Accept on that single child.

To remove a lot of boilerplate code, I'd like to be able to provide a VisitChildren method that is used by default, if a more specific method is not provided. It would look something like this:

func (v *Visitor) VisitChilrdren(ctx *?) interface{} {
    return ctx.GetChild(0).Accept()
}

The Go runtime looks like it was designed to enable this, but as it is currently constructed it does not. You have to provide an implementation of every single Visit Method. There's no default traversal and you cannot provide a default traversal, without modifying generated code. I believe this is what @kaby76 was trying to convey as well as the fact that the other runtimes provide this capability.

As you know, Go does provide the ability to "override" functions on an embedded struct. I think the problem with the current implementation is where the VisitChildren function is declared.

A default implementation of this method is provided on the antlr.BaseParseTreeVisitor as follows.

func (v *BaseParseTreeVisitor) VisitChildren(_ RuleNode) interface{}     { return nil }

I believe if you moved this default implementation to the generated base class (e.g. BaseMVBVisitor) then it could be overriden in my visitor (e.g. MyVisitor).

This would provide significant reduction of boilerplate code for many real grammars. As has been pointed out, for toy and sample grammars this is not an issue. Just provide a Visit function for every rule.

jimidle commented 2 weeks ago

There is no need to provide an implementation of every visit function. The default in the base visitor should accept automatically on. its children. I am afraid I don't have time to create a full example. I think you can override visitChildren, but IIRC that just calls accept on the children. I am not sure it would be that useful to try and do everything in that func.

Maybe you can provide your code to me (privately if you wish) and I can see what you are trying to achieve.

To the person asking for an error signature - while that is indeed a common pattern in Go, it is not how you walk parser trees. Create an error collector and make it avaiable in the struct that embeds BaseVisitor, you can also create nodes that mark errors and unresolved. Remember that this is a parse tree walker - generally, you would construct some kind of AST using it. Don't throw exceptions or return errors in the walker.

At some point I would like to generate a generic visitor and listener. I touched the listener and visitor the least in the huge changes made to make the runtime work. It probably is time to give the walkers some attention.


type JimVisitor struct {
        // Base generated by ANTLR
    parsing.BaseMVBVisitor
    // Add things like an error collector here
}

func (v *JimVisitor) VisitChildren(node antlr.RuleNode) interface{} {
       // This is what the default implementation does, I just added a print
    fmt.Println("Visiting children. Rule", node.GetRuleContext().GetRuleIndex())
    return v.BaseMVBVisitor.VisitChildren(node)
}

// Override any other visitors here - this example is from a working compiler

Here is how the very base of the visitor is defined (this is all code in the source of course):

type ParseTreeVisitor interface {
    Visit(tree ParseTree) interface{}
    VisitChildren(node RuleNode) interface{}
    VisitTerminal(node TerminalNode) interface{}
    VisitErrorNode(node ErrorNode) interface{}
}

type BaseParseTreeVisitor struct{}

var _ ParseTreeVisitor = &BaseParseTreeVisitor{}

func (v *BaseParseTreeVisitor) Visit(tree ParseTree) interface{}         { return tree.Accept(v) }
func (v *BaseParseTreeVisitor) VisitChildren(_ RuleNode) interface{}     { return nil }
func (v *BaseParseTreeVisitor) VisitTerminal(_ TerminalNode) interface{} { return nil }
func (v *BaseParseTreeVisitor) VisitErrorNode(_ ErrorNode) interface{}   { return nil }

/// the generated code then usesu:

type BaseMVBVisitor struct {
        // Note that I obviously missed changing this from a pointer - I will do so, but the functionality is not affected
    *antlr.BaseParseTreeVisitor
}

func (v *BaseMVBVisitor) VisitTranslationUnit(ctx *TranslationUnitContext) interface{} {
    return v.VisitChildren(ctx)
}

So I embed the generated visitor in my own structure as above, and supply implementation of just those things I need.

You can ovveride visitChildren. The Accept functions are auto generated in the parser code.

func (s *TranslationUnitContext) Accept(visitor antlr.ParseTreeVisitor) interface{} {
    switch t := visitor.(type) {
    case MVBVisitor:
        return t.VisitTranslationUnit(s)

    default:
        return t.VisitChildren(s)
    }
}

Now, do I think that the tree walking is complete and needs no improvement? No.

One last thing is because of that pointer (which is definitely a bug), the BaseParseTreeVisitor will be nil unless you initlialize it. Maybe that is what is throwing people off. I will fix that for sure.

For now, you could remove the * from generated BaseVisitor

jimidle commented 2 weeks ago

@parrt - let's close this - I have supplied a number of examples and at some point I will create a full working repo. I don't think I can add any more here. If we do a 4.14.x then I will try and create generic Listeners and Walkers, though that will likely be a breaking change, which is difficult to do with go semver expectations, but I suppose people could stick with 4.13.1

jimidle commented 2 weeks ago

One final comment though...

I have just implemented the visitor with Generics, fixed the interface definitions and fixed the generation of that stupid pointer. It will be another day or so before I push that to the development branch and the main tree. I guess I will give in and also get rid of the exp imports ;)

mburbidg commented 2 weeks ago

I'm not sure why this does not work for me. As far as I can see, I'm set up just as in your example. But for me, VisitChildren is never called. In my example, I would expect VisitChildren to get called when ctx.ProgramActivity().Accept(v) is called, since I've commented out the corresponding visit function.

Here's is my visitor:

package parser

import (
    "fmt"
    "github.com/antlr4-go/antlr/v4"
    "github.com/mburbidg/mygql/gql/parser/ast"
    "github.com/mburbidg/mygql/gql/parser/gen"
    "github.com/mburbidg/mygql/utils"
    "strconv"
    "strings"
)

type Visitor struct {
    gen.BaseGQLVisitor
}

func NewVisitor() gen.GQLVisitor {
    return &Visitor{}
}

func (v *Visitor) VisitChildren(node antlr.RuleNode) interface{} {
    fmt.Println("Visiting children. Rule", node.GetRuleContext().GetRuleIndex())
    return v.BaseGQLVisitor.VisitChildren(node)
}
...
// Root of the tree visit function
func (v *Visitor) VisitGqlProgram(ctx *gen.GqlProgramContext) interface{} {
    pgm := &ast.GQLProgram{}
    if ctx.ProgramActivity() != nil {
        activity := ctx.ProgramActivity().Accept(v)
        pgm.Activity = activity
    }
    if ctx.SessionCloseCommand() != nil {
        pgm.CloseOnCompletion = true
    }
    return pgm
}

// Commented out visit function.
//func (v *Visitor) VisitProgramActivity(ctx *gen.ProgramActivityContext) interface{} {
//  return ctx.TransactionActivity().Accept(v)
//}

Here's my code that invokes the parser and visits the resulting tree.

    src := `CREATE SCHEMA IF NOT EXISTS /mymovies`
    input := antlr.NewInputStream(src)
    lexer := gen.NewGQLLexer(input)
    stream := antlr.NewCommonTokenStream(lexer, 0)
    p := gen.NewGQLParser(stream)
    errListener := newErrorListener()
    p.AddErrorListener(errListener)
    p.BuildParseTrees = true
    tree := p.GqlProgram()
    assert.Equal(t, 0, len(errListener.errors))
    visitor := NewVisitor()
    tree.Accept(visitor)