dabeaz / sly

Sly Lex Yacc
Other
816 stars 107 forks source link

Combining Lexer Match Actions and Token Remapping #69

Open zvr opened 3 years ago

zvr commented 3 years ago

From the examples, one can have actions when a lexical rules matches:

    @_(r'\d+')
    def NUMBER(self, t):
        t.value = int(t.value)   # Convert to a numeric value
        return t

One can also remap tokens:

    ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
    ID['if'] = IF
    ID['else'] = ELSE

These cannot be combined, since if you define a function to perform an action, the next remap attempt raises an error:

TypeError: 'function' object does not support item assignment

What is the recommended way to use both of these techniques in a lexical token?

I assume the function could examine the value of the match (say, the string in ID) with something like if t.value == 'if', but how to return a different token?

dabeaz commented 3 years ago

The two techniques can't be combined. In fact, the whole token remapping feature was meant to replace the need for writing a function like this (which was commonplace):

keywords = { 'if', 'else', 'while' }

@_(r'[a-zA-Z_][a-zA-Z0-9_]*')
def ID(self, t):
    if t.value in keywords:
        t.type = t.value.upper()
    return t

As shown in the function, the token type can be changed by assigning a different value to t.type.

zvr commented 3 years ago

Great, thanks!