antlr / antlr4test-maven-plugin

A maven plugin used to test the grammars-v4 repo grammars
BSD 3-Clause "New" or "Revised" License
17 stars 16 forks source link

New case insensitive options #1

Closed KvanTTT closed 7 years ago

KvanTTT commented 7 years ago

I want to use not only lower case. but UPPER CASE for case insensitive lexers (such as SQL languages). So, I suggest to use the following options for caseInsensitive parameter:

bramp commented 7 years ago

Sorry I know this is an old/closed bug, but I had a question about this feature. Basically why? If a grammar is meant to be case-insensitive, shouldn't it be designed like that? Having this parameter, which either uppers or lower cases all the input during tests, seems like it would hide issues.

I've stumbled across this, because while testing out the MySqlLexer, it would refuse to accept the test input. I eventually discovered this was because the tests had all be run with the input converted to uppercase. IMHO the correct fix in that situation would be to change the MySqlLexer, not to change my input to be uppercase.

So is there a use case for this I'm missing? Thanks

KvanTTT commented 7 years ago

Case insensitive words improve readability of grammar and have a positive influence on generated parser performance. Meanwhile, in the parser, you still can rely on symbols case. See @sharwell description: https://gist.github.com/sharwell/9424666#gistcomment-1186748.

For Go runtime you feel free to use my AntlrCaseInsensitiveInputStream.go.

bramp commented 7 years ago

Ah so simpler parsers and performance seem like good goals. Thanks for sharing your CaseInsensitiveInputStream, I wrote something similar yesterday that wraps any CharStream. CaseChangingStream.

However, requiring the case change makes the parser harder to use. It is not obvious to the user of the parser that they must do this. Documentation could be improved to fix this, but I would like to suggest an alternative. A option be placed in the g4 file, that changes the language target to either require the generated Lexer to require a CaseInsensitiveInputStream, or preferably just take a normal Input, and wrap it in a CaseChangingStream for example. Thoughts? (I'm happy to take this over to https://github.com/antlr/antlr4/ if better discussed there)

KvanTTT commented 7 years ago

Such issue already exists: https://github.com/antlr/antlr4/issues/1002

bramp commented 7 years ago

Gah ok. So looks like you implemented what I was suggesting, but @parrt shot it down due to lack of time :( That's disappointing.

KvanTTT commented 7 years ago

Maybe it's good :smile: My old implementation is not ideal. Now I think the best implementation should include a grammar and mode options. Case insensitive stream is used if respective option activated.