antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.3k stars 3.3k forks source link

Fixed position parsing - Lexing by character position in line. #883

Open ryaneberly opened 9 years ago

ryaneberly commented 9 years ago

It would be useful to be able to express character position constraints in a pure ANTLR grammar (without predicate actions).

Example: I have lexical rules such as: PS_BEGIN: {getCharPositionInLine()==23}? [bB];

Thanks for considering it. I understand it may be a specialized case, and should stay implemented as is. Just thought I'd suggest it

sharwell commented 9 years ago

As a side note, you should move that predicate to the other side of the first character:

PS_BEGIN: [bB] {getCharPositionInLine()==24}?;

If any lexer rule in the entire lexer grammar has a predicate on the left edge, it will prevent the start state of the DFA from being cached and force the lexer through a much slower code path for every token. In real-world grammars, this can easily be the difference between the lexer using 95% of the CPU time for the entire application, and the lexer using <1% of the CPU time.

ryaneberly commented 9 years ago

Wow. Thanks for the tip

KvanTTT commented 9 years ago

I guess this issue can be closed?

ryaneberly commented 9 years ago

@KvanTTT Not sure. If the consensus is "no" this will not be a feature. then yes it can be closed.

dhowe commented 3 years ago

Is there a reason I can't use this predicate (getCharPositionInLine) in a grammar that outputs to JavaScript? or do I need to use {this.column==21}? ?