In a PLY lexer, I can implement certain weird things such as case-insensitive keywords by defining a function with the same name as I'd normally give the string variable containing the regexp for that token.
For example:
from ply import lex
tokens = ("GREET", "FIGHT", "WORD")
reserved = ("GREET", "FIGHT")
t_ignore = ' +'
def t_error(t):
raise ValueError("oh noooo")
def t_WORD(t):
"[a-zA-Z]+"
upper = t.value.upper()
if upper in reserved:
t.value = upper
t.type = upper
return t
lexer = lex.lex()
lexer.input("grEEt samuel FIGHT tomato greet potato FIght pOEtRY")
for token in lexer:
print token
#LexToken(GREET,'GREET',1,0)
#LexToken(WORD,'samuel',1,6)
#LexToken(FIGHT,'FIGHT',1,13)
#LexToken(WORD,'tomato',1,19)
#LexToken(GREET,'GREET',1,26)
#LexToken(WORD,'potato',1,32)
#LexToken(FIGHT,'FIGHT',1,39)
#LexToken(WORD,'pOEtRY',1,45)
I can't find anything in rply's documentation that explains how to do the equivalent of defining t_WORD as a function in the above program. Nor can I find anything that indicates that it can't be done.
I ran into this and decided that the way to do it was to wrap the lexer output with a function that would intercept and modify the tokens I wanted additional logic for.
In a PLY lexer, I can implement certain weird things such as case-insensitive keywords by defining a function with the same name as I'd normally give the string variable containing the regexp for that token.
For example:
I can't find anything in rply's documentation that explains how to do the equivalent of defining t_WORD as a function in the above program. Nor can I find anything that indicates that it can't be done.