I wanted a Python 3 lexer that consumes byte strings, but this doesn't seem possible with LexerGenerator. For example, for this test program:
from rply import LexerGenerator
lg = LexerGenerator()
lg.add('NUMBER', br'\d+')
lg.add('ADD', br'\+')
lg.ignore(br'\s+')
lexer = lg.build()
for token in lexer.lex(b'1 + 1'):
print(token)
you get:
Traceback (most recent call last):
File "test.py", line 7, in <module>
for token in lexer.lex(b'1 + 1'):
File "/usr/lib/python3/dist-packages/rply/lexer.py", line 56, in __next__
return self.next()
File "/usr/lib/python3/dist-packages/rply/lexer.py", line 46, in next
colno = self._update_pos(match)
File "/usr/lib/python3/dist-packages/rply/lexer.py", line 27, in _update_pos
self._lineno += self.s.count("\n", match.start, match.end)
TypeError: a bytes-like object is required, not 'str'
(I ended up writing my own lexer for unrelated reasons, so this is not a show-stopper for me, but I thought you might want to fix it.)
I wanted a Python 3 lexer that consumes byte strings, but this doesn't seem possible with LexerGenerator. For example, for this test program:
you get:
(I ended up writing my own lexer for unrelated reasons, so this is not a show-stopper for me, but I thought you might want to fix it.)