Closed ccleve closed 2 years ago
"Lexical state" does not fit in PEG parsing due to the reasons below:
@ccleve, if you don't approve the reasons above, can you share more concrete example with me?
Here's a contrived example. There's a query language for a full-text search engine. It can do queries like this:
category="Books" and text contains "harry and potter" and color="Black"
In this case, we want to recognize "and" as a key word, except when it appears inside a quoted string. The normal way to handle this is to recognize the quoted string as a whole. But for a search engine, you actually have to tokenize what's in the quotes. So, outside the quotes, "and" should return an AND token, and inside the quotes it should return a WORD token.
In the past, when I used JFlex, I flipped into a QUOTED_STRING_STATE when I hit the first quote. Inside that state only WORD tokens are recognized. I flipped back to the default state on the second quote.
Another example: it's helpful to use the same lexer for both documents and queries. In document model, we recognize words. In query mode, we recognize words, keywords, equals and parentheses. It's really helpful to be able to flip a switch and just not recognize some things.
I understand what you want to do. As I told above, a PEG parser is never a lexer which scans text in order. To do it using PEG, which is processed out of order, you should do as below:
...
string <- ["] word_list ["] / ["] ["]
word_list <- WORD space+ word_list / WORD
WORD <- ( '\\' ( space / . ) / !space !["] . )+
...
AND <- "and"
space <- blank / end_of_line
blank <- [ \t\v\f]
end_of_line <- '\r\n' / '\n' / '\r'
@ccleve, I'll close this issue since your request is technically incompatible with PEG parsers.
JFlex has nice support for controlling lexical state. I assume that Flex does as well. In JFlex you call yybegin(int state) to start a new state, and then any rules that are wrapped by the state will get invoked:
%%