Skip action for chars in token

antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

http://antlr.org

BSD 3-Clause "New" or "Revised" License

17.22k stars 3.29k forks source link

Skip action for chars in token #1170

Closed KvanTTT closed 3 years ago

KvanTTT commented 8 years ago

Consider the following Token:

STRING:              '\'' (~'\'' | '\'\'')* '\'';

In visitor or listener I only need for STRING value without quotes. Now I should manually trim them with substring(1, text.length - 2) method.

So I suggest the following syntax for ignoring quotes with fragment rules:

STRING: QUOTE (~'\'' | '\'\'')* QUOTE;
fragment QUOTE: '\'' -> skip;

Without fragment rules the following syntax can be used:

STRING:                 <skip>'\'' (~'\'' | '\'\'')* <skip>'\'';

SimonStPeter commented 8 years ago

You're saving yourself 1 statement in the source language, which is very little, but you still haven't removed the escaped quote, so with your suggestion 'that''s interesting' becomes that''s interesting automatically, but you still have to deal with the '' within which is more work that a single substring() call. And possibly any other special such as unicode literals, \n \t literals too if you support them. Is it worth it?

kkbkris commented 8 years ago

Dear sender,

Please could you explain this further. The '' double quote gets interpreted as an escape like you mentioned so the output would be that's interesting, would it not. In this case it would be worth it as it would be a separate call.

Kind regards and Yours sincerely, Kristian Robert David Stacey Student of Masters of Engineering Undergraduate of Computer science and electronics at the university of bristol

On 19 Apr 2016, at 16:37, "SimonStPeter" notifications@github.com wrote:

You're saving yourself 1 statement in the source language, which is very little, but you still haven't removed the escaped quote, so with your suggestion 'that''s interesting' becomes that''s interesting automatically, but you still have to deal with the '' within which is more work that a single substring() call. And possibly any other special such as unicode literals, \n \t literals too if you support them. Is it worth it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

SimonStPeter commented 8 years ago

@kkbkris, all I'm saying is that stripping off the outermost quotes will not get you a clean representation of the string in general, as in general the string may contain other crud such as escaped quotes or escaped special chars or unicode. So, assuming we escape using a backslash: 'this isn\'t a nice\nstringwith \0x0066 char' would end up with kvanttt's suggestion like this: this isn\'t a nice\nstringwith \0x0066 char internally in the parse, which isn't directly usable without considerable further scrubbing, so is the proposal for auto-stripping of enclosing quotes of value? That's all. HTH

mkw commented 6 years ago

It's rather a special case, but this feature would make parsing keywords in formats like JSON a bit easier. Right now, when we have “the lexer pass all keywords to the parser as keyword token types, and then we create a parser id rule that matches ID and any of the keywords” (from The Definitive ANTLR 4 Reference in the “Context-Sensitive Lexical Problems” chapter), and we want to strip the quotes from the keywords, we have to do the stripping on every single lexer rule, which gets a bit verbose. Making a special "skipped" fragment would accomplish this safely because we know that there are no escaped quotes inside the keywords.

ericvergnaud commented 6 years ago

Hi Please move this to the google discussion group Thanks

Envoyé de mon iPhone

Le 9 mars 2018 à 04:43, Michael Werle notifications@github.com a écrit :

It's rather a special case, but this feature would make parsing keywords in formats like JSON a bit easier. Right now, when we have “the lexer pass all keywords to the parser as keyword token types, and then we create a parser id rule that matches ID and any of the keywords” (from The Definitive ANTLR 4 Reference in the “Context-Sensitive Lexical Problems” chapter), and we want to strip the quotes from the keywords, we have to do the stripping on every single lexer rule, which gets a bit verbose. Making a special "skipped" fragment would accomplish this safely because we know that there are no escaped quotes inside the keywords.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

seetamraju commented 4 years ago

Hi, With so many FOSS-libraries out there, that make it so easy to convert JSON into "HashMaps" (or equivalent).. and, which do a great job of providing good-enough error-messages, .. .. very curious to understand the use-case, that is forcing you rely on a custom-parser, for JSON? As an example, Are your JSON files 100s of megabytes in size or larger?

KvanTTT commented 3 years ago

Now I think there is no big need for this issue.