dabeaz / sly

Sly Lex Yacc
Other
816 stars 107 forks source link

ignore case sensitivity #62

Closed ker2x closed 2 years ago

ker2x commented 3 years ago

I'm trying to have my language case insensitive.

I'm doing this :

class FortyLexer(Lexer):
    """Main Lexer Class for FortyFor"""
    Lexer.reflags = Lexer.regex_module.IGNORECASE

    tokens = {
        ID, NUMBER, PLUS, MINUS, TIMES, DIVIDE, ASSIGN, LPAR, RPAR,
        IF, ELSE
    }

    ignore = ' \t'
    ignore_comment = r'\!.*'
    ignore_newline = r'\n+'

    ID      = r'[a-zA-Z_][a-zA-Z0-9_]*'
    ID[r'if'] = IF
    ID[r'else'] = ELSE
...

"else" is recognized as an ELSE token, but "ELSE" is still recognized as an "ID". Am i doing it wrong ?

i also tried with " Lexer.reflags = Lexer.regex_module.RegexFlag.IGNORECASE"

SunChuquin commented 3 years ago

you can (?i:)

e.g: FUN = r'(?i:fun_)'

classabbyamp commented 3 years ago
class MyLexer(sly.Lexer):
    reflags = re.IGNORECASE
    ...

this works just fine for me

dabeaz commented 2 years ago

case-insensitive keyword matching can be accomplished by writing a handler method

_keywords = { 'if', 'else', 'while', ... }

@_(r'[a-zA-Z_][a-zA-Z0-9_]*')
def ID(self, t):
     if t.value.lower() in _keywords:
         t.type = t.value().upper()
     return t

I don't know if SLY will be modified to support this through the array short-cut syntax.

jschultz410 commented 1 year ago

@dabeaz I'm writing a SLY lexer and parser for a language that has a ton (248) of case-insensitive reserved keywords.

Do you recommend using explicit matching rules for each keyword or is it better to have one broad regex and then refine the type like you did with the ID token in your previous post? One nice thing about your ID approach is that I don't have to worry about ordering of prefix matching keywords (e.g. - 'AS', 'ASC').

I'm assuming that I should unpack _keywords into tokens too for the parser?

PS - It also seems that setting the class variable reflags = re.IGNORECASE in a lexer class handles global case insensitivity?