antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.12k stars 3.28k forks source link

Lexer semantic pred ignores fail option #320

Open cowang opened 11 years ago

cowang commented 11 years ago

In the following grammar, the exact same .java lexer is generated, whether or not the fail option is present on the semantic predicate. It makes logical sense that no message is generated when the lexer rule fails; however, we should warn the programmer that they are trying to accomplish something that is impossible when they include the fail option.

grammar CharExperiment_02;
stat : 'start' CharacterLiteral 'end' EOF;

// Lexer

CharacterLiteral
    :   '\'' SingleCharacter '\''
    |   '\'' {false}? // <fail='unclosed character literal'>
    ;

fragment SingleCharacter : ~['\\\r\n] ;

WS   : [ \r\t\n]+ -> skip ;
sharwell commented 11 years ago

Unhandled element options are typically not reported to allow for other targets to provide custom features through them. We'll look into adding support for targets to specify which specific options are supported by the particular target, at which point we could report a warning for the lack of support in this case.

cowang commented 11 years ago

I'm not sure I understand. When I try to use a banana='err msg' option, I get an error message for "unsupported option banana". I would like for there to be an "unsupported option fail for lexer semantic predicate" message.

It looks to me like this failed-predicate-error-msg error would be an error in any target. Given your statement that "ANTLR 4 lexers do not (and cannot) support customized error messages for failed predicates, because they function as DFA state machines instead of top-down parsers", no target would be able to provide lexer error messages through the fail predicate. And yet, someone reading code that contained the fail option for a semantic predicate would think that the code accomplished something because it was accepted by antlr4.

(My last paragraph sounds a little testy on a rereading, Sam. You know, "Ah ha, gotcha using your own words". I don't mean it that way, I just wasn't sure from your comment that I had clearly stated what I wanted and why. I certainly respect your right to prioritize and sequence the work to be done.)

sharwell commented 11 years ago

Sometimes the implementation doesn't do exactly what we were thinking. I had a pretty good idea that this particular syntax would produce an error if you used something like <banana='foo'>. However, the feature in general is similar to the way syntax like the following works:

myRule
@ruleVersion{0}
  : ...
  ;

When you use the sharwell/optimized branch of ANTLR 4 (or the C# target which is based on that branch), this produces a special annotation in the generated code. In other cases, it's simply ignored. We need a solid way to have target-specific validation of options and named actions where a warning (an error could unnecessarily restrict a grammar to only working with one target) is produced in cases where either of these is used with a target that does not support it.

cowang commented 11 years ago

Ah, yes, I see. There is a general solution needed, and my special case will be solved when that general solution is solved. Solving my special case now would end up adding (perhaps dead code) clutter.