BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
586 stars 165 forks source link

Add SQL grammar as example #364

Open andreasabel opened 3 years ago

andreasabel commented 3 years ago

Here is a partial grammar of SQL that I'd like to add to the example suite: https://github.com/GrammaticalFramework/gf-contrib/blob/master/query-converter/MinSQL.bnf

SQL has case-insensitive keywords. This is a feature we could add to BNFC via some pragma, e.g.,

token case-insensitive keywords

as special case of a general

token case-insensitive <name> <regex>
Commelina commented 3 years ago

Case-insensitive keywords is really a useful feature! It will be better to add a --case-insensitive option to bnfc besides adding a pragma to a token.

In fact, I am facing the same problem with the Haskell backend (--text-token). I tried to manually modify the generated treeFind function in Lex.x:

treeFind N = tv s
treeFind (B a t left right) | (Data.Text.toUpper s)  < (Data.Text.toUpper a) = treeFind left
                            | (Data.Text.toUpper s)  > (Data.Text.toUpper a) = treeFind right
                            | (Data.Text.toUpper s) == (Data.Text.toUpper a) = t

It seems to work. However it will be better to make it work with all backends by a simple pragma. I am really looking forward to seeing the feature!

andreasabel commented 3 years ago

It will be better to add a --case-insensitive option to bnfc besides adding a pragma to a token.

I think case-insensitive keywords are rather a property of the language defined by the grammar, than a method on how this grammar should be processed. So I favor a pragma in the grammar file over a command line option to bnfc. Options should configure the backends but not change the semantics of the grammar.

A shorter pragma would be

case-insensitive keywords;
ScottFreeCode commented 3 years ago

Would there be any use case for separating whether keywords are case-insensitive from whether token types are case-insensitive? For instance, strings are tokens and usually they should record the case actually used. More generally tokens are defined by regular expression which (comparing with other languages/tools) usually are case-sensitive if you specify a literal character/string or an explicit range like "[a-z]" or "[A-Z]".

(When a regular expression needs case-insensitivity for more than just an individual character "[Aa]", a lot of the predefined character classes signifying e.g. "alphabetic", "alphanumeric", "unicode alphabetic" include both cases and there's usually an option to make a string literal in a regex be case-insensitive – well, usually the whole regex, but we can imagine tagging individual literal sequences with BNFC's encoding since it's structured rather than being a string with various escapes for regex features.)

Would there ever be a case for marking individual keywords case-sensitive or not? E.g. X . Y ::= "ProperCase" String anycase "but THIS can be ANY case";?

ScottFreeCode commented 6 months ago

I'm considering whipping up a workaround using some combination of define, internal and _ . to make the uppercase versions synonyms rewritten to be the lowercase versions or vice versa.

Is there a better way at this point? Any news or advice?