alex / rply

An attempt to port David Beazley's PLY to RPython, and give it a cooler API.
BSD 3-Clause "New" or "Revised" License
381 stars 60 forks source link

can't lex byte strings on Python 3 #55

Open jwilk opened 8 years ago

jwilk commented 8 years ago

I wanted a Python 3 lexer that consumes byte strings, but this doesn't seem possible with LexerGenerator. For example, for this test program:

from rply import LexerGenerator
lg = LexerGenerator()
lg.add('NUMBER', br'\d+')
lg.add('ADD', br'\+')
lg.ignore(br'\s+')
lexer = lg.build()
for token in lexer.lex(b'1 + 1'):
    print(token)

you get:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    for token in lexer.lex(b'1 + 1'):
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 56, in __next__
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 46, in next
    colno = self._update_pos(match)
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 27, in _update_pos
    self._lineno += self.s.count("\n", match.start, match.end)
TypeError: a bytes-like object is required, not 'str'

(I ended up writing my own lexer for unrelated reasons, so this is not a show-stopper for me, but I thought you might want to fix it.)