Open andreasabel opened 3 years ago
Case-insensitive keywords is really a useful feature! It will be better to add a --case-insensitive
option to bnfc
besides adding a pragma to a token.
In fact, I am facing the same problem with the Haskell backend (--text-token
). I tried to manually modify the generated treeFind
function in Lex.x
:
treeFind N = tv s
treeFind (B a t left right) | (Data.Text.toUpper s) < (Data.Text.toUpper a) = treeFind left
| (Data.Text.toUpper s) > (Data.Text.toUpper a) = treeFind right
| (Data.Text.toUpper s) == (Data.Text.toUpper a) = t
It seems to work. However it will be better to make it work with all backends by a simple pragma. I am really looking forward to seeing the feature!
It will be better to add a
--case-insensitive
option tobnfc
besides adding a pragma to a token.
I think case-insensitive keywords are rather a property of the language defined by the grammar, than a method on how this grammar should be processed. So I favor a pragma in the grammar file over a command line option to bnfc
. Options should configure the backends but not change the semantics of the grammar.
A shorter pragma would be
case-insensitive keywords;
Would there be any use case for separating whether keywords are case-insensitive from whether token types are case-insensitive? For instance, strings are tokens and usually they should record the case actually used. More generally tokens are defined by regular expression which (comparing with other languages/tools) usually are case-sensitive if you specify a literal character/string or an explicit range like "[a-z]" or "[A-Z]".
(When a regular expression needs case-insensitivity for more than just an individual character "[Aa]", a lot of the predefined character classes signifying e.g. "alphabetic", "alphanumeric", "unicode alphabetic" include both cases and there's usually an option to make a string literal in a regex be case-insensitive – well, usually the whole regex, but we can imagine tagging individual literal sequences with BNFC's encoding since it's structured rather than being a string with various escapes for regex features.)
Would there ever be a case for marking individual keywords case-sensitive or not? E.g. X . Y ::= "ProperCase" String anycase "but THIS can be ANY case";
?
I'm considering whipping up a workaround using some combination of define
, internal
and _ .
to make the uppercase versions synonyms rewritten to be the lowercase versions or vice versa.
Is there a better way at this point? Any news or advice?
Here is a partial grammar of SQL that I'd like to add to the example suite: https://github.com/GrammaticalFramework/gf-contrib/blob/master/query-converter/MinSQL.bnf
SQL has case-insensitive keywords. This is a feature we could add to BNFC via some pragma, e.g.,
as special case of a general