antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.95k stars 3.26k forks source link

Antlr4 generates invalid go parser #4150

Open Rini-Tikki opened 1 year ago

Rini-Tikki commented 1 year ago

I am trying to generate lexer, parser, listener and all standard files in Golang for the following grammar: https://github.com/trinodb/trino/blob/master/core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4

Environment Go version: 1.19 Antlr runtime: I tried all the releases from 4.9 to 4.12, and generated parser is always in some error state like : Unresolved type 'ParserRuleContext' Duplicate method 'GetStart' Cannot use 'localctx' (type IFrameExtentContext) as the type ParserRuleContextType does not implement 'ParserRuleContext' as some methods are missing:SetStart(Token) Impossible type assertion: '*FrameExtentContext' does not implement 'IFrameExtentContext'

And we cannot modify the auto-generated files it seems to fix the issue.

Can someone take a look into this?

Many thanks in advance

kaby76 commented 1 year ago

For the CSharp target, this does not work because of symbol conflict: https://github.com/trinodb/trino/blob/75bc783419482f30c103cf74510498ff480e82fc/core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4#L522

All "base" names in the grammar must be renamed to something else.

The grammar contains the lexer symbol "EMPTY", which is a well-known symbol conflict for CSharp.

But, yes, for the Go target, the generated code is wrong. ParserRuleContext is not qualified via package.

type IPredicateContext interface {
    antlr.ParserRuleContext

    // GetParser returns the parser.
    GetParser() antlr.Parser

    // GetValue returns the value attribute.
    GetValue() ParserRuleContext

    // SetValue sets the value attribute.
    SetValue(ParserRuleContext)

    // IsPredicateContext differentiates from other interfaces.
    IsPredicateContext()
}

This is indeed wrong code. Uses should be antlr.ParserRuleContext. There are other occurrences as well in the generated code.

Rini-Tikki commented 1 year ago

@kaby76 Thank you for looking into it. Is there any fix for this?

kaby76 commented 1 year ago

I'll work on a PR fix later today.

kaby76 commented 1 year ago

Actually, this requires modifications that I'm not the best to do. The grammar has several problems:

Rini-Tikki commented 1 year ago

The workaround works for now. Thanks!

jimidle commented 1 year ago

I am happy to look at this guys, however it might take me a day or so to get some time on it. As I have never come across this error, it is almost certainly the name of the labels and not that there code generation is wrong, or I would have seen that by now. But let me take a look...

jimidle commented 1 year ago

OK, so the problem isn't that start is a keyword in Go, but that using start as a label, causes the generation of a method for the rule that is using start called GetStart. However the RuleContext defined for that rule embeds ParserRuleContext, which also has a GetStart() method, but which causes a covariant definition, which is not allowed in Go and hence the compile error.

In fact, it would be better not to use getters in Go, but that damage is now done - I have to assume that someone somewhere has used the existing GetStart() method, so I cannot make that change (I inherited this of course). I can add start (and anything else that would cause such a clash to the list of keywords that need a translation and that would indeed convert it to start_, which solves the issue, though somewhat unsatisfactorily. I will try and make time to do this as I also want to make an addition in that exact area to be able to access rule names., which is not currently possible in the generated Go code. I could do it generically, but then existing code using something like GetS(), when the label is s would have to change to use Get_S()

In general, I avoid using real words for labels for these kinds of reasons. Using labels such as s and e is idiomatic for Go, and I tend to do that. I hope that helps.

Rini-Tikki commented 4 months ago

@jimidle it is still not working for the mentioned grammar.