antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.12k stars 3.28k forks source link

Set exclusion/intersection when defining fragments within lexer rules #2246

Open jonopare opened 6 years ago

jonopare commented 6 years ago

I am trying to reuse fragments to build up a lexer rule that will include a range of characters, but exclude some subsets from that range. Essentially, I would like to define the fragment as: A & ~(B | C) where A, B, and C are themselves fragments.

Besides being unable to use & for set intersection, I also get an error rule reference 'B' is not currently supported in a set.

I can rearrange the logic so that it's defined as ~(~A | B | C) but that won't get me past the last error; to do that, I need to inline each of the sets.

Feature request 1) It would reduce double-negation if there was a set intersection operator, and 2) It would reduce repetition if the fragments could be reused and didn't have to be inlined.

I asked this question on stackoverflow (it has a slightly more extensive example in case the one above is hard to follow, with its lack of actual ANTLR syntax) https://stackoverflow.com/questions/49187813/exclude-chars-from-range-in-antlr-lexer

And it's (sort of) related to this question too https://stackoverflow.com/questions/16790861/rule-reference-is-not-currently-supported-in-a-set-in-antlr4-grammar

ericvergnaud commented 6 years ago

Hi The place for discussions is the google discussion group

Envoyé de mon iPhone

Le 9 mars 2018 à 16:24, jonopare notifications@github.com a écrit :

I am trying to reuse fragments to build up a lexer rule that will include a range of characters, but exclude some subsets from that range. Essentially, I would like to define the fragment as: A & ~(B | C) where A, B, and C are themselves fragments.

Besides being unable to use & for set intersection, I also get an error rule reference 'B' is not currently supported in a set.

I can rearrange the logic so that it's defined as ~(~A | B | C) but that won't get me past the last error; to do that, I need to inline each of the sets.

Feature request

It would reduce double-negation if there was a set intersection operator, and It would reduce repetition if the fragments could be reused and didn't have to be inlined. I asked this question on stackoverflow (it has a slightly more extensive example in case the one above is hard to follow, with its lack of actual ANTLR syntax) https://stackoverflow.com/questions/49187813/exclude-chars-from-range-in-antlr-lexer

And it's (sort of) related to this question too https://stackoverflow.com/questions/16790861/rule-reference-is-not-currently-supported-in-a-set-in-antlr4-grammar

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.