antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.94k stars 3.26k forks source link

Incorrect code generation when importing lexer grammar with custom channels into combined grammar #965

Open WalkerCodeRanger opened 9 years ago

WalkerCodeRanger commented 9 years ago

Using ANTLR 4.5.1, I have a grammar that imports several lexer grammars that declare and use custom channels. When I generate parser & lexer for this in CSharp, the resulting lexer has a bunch of errors around the channels. For lines like:

    case 14: _channel = DocComments; break;

There are two errors, one that _channel is not declared and the other that DocComments is not declared. DocComments is my custom channel. Shouldn't it declare constants for each channel? Isn't that the point of the channels {Comments, DocComments} declaration?

Note that ANTLR generates lots of warnings of the form:

warning(155): Adamant.g4:17:70: rule PP_Define contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output

I gather from #497 that this warning only applies to the lexer interpreter and shouldn't matter to the generated lexer. However, it doesn't make any sense since the grammar does contain a definition for the channel.

WalkerCodeRanger commented 9 years ago

So it looks like this is the result of using channels in a combined grammar. If that is not supported, then there should be an error for it.

parrt commented 9 years ago

ok, thanks for tracking down.

WalkerCodeRanger commented 9 years ago

I have updated the title to clarify what the actual issue is. This occurs when you import a lexer grammar that uses custom channels into a combined grammar. The generated lexer for the combined grammar then has the syntax error above. I believe this happens in both CSharp and Java generators. I know channels are not supported in combined grammars so I assume this should be reported as an error instead of silently generating incorrect code.

To reproduce use:

Chan.g4

lexer grammar Chan;
channels { CHAN }
WS : ' ' -> channel(CHAN);
Char : .;

Parse.g4

grammar Parse;
import Chan;
file : Char* EOF;

Then generate parser and lexer for Parse.g4.