unbounded recursion in LexerStream.next()

jwilk commented 8 years ago

This test program

from rply import LexerGenerator
lg = LexerGenerator()
lg.ignore(r'\s')
for token in lg.build().lex(' ' * 1000):
    pass

makes the Python interpreter sad:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    for token in lg.build().lex(' ' * 1000):
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 56, in __next__
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
...
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 38, in next
    match = rule.matches(self.s, self.idx)
  File "/usr/lib/python3/dist-packages/rply/lexergenerator.py", line 33, in matches
    return Match(*m.span(0)) if m is not None else None
RecursionError: maximum recursion depth exceeded

alex commented 8 years ago

I'm not sure I understand why this causes unbound recursion :-(

jwilk commented 8 years ago

The relevant code is:

def next(self):
    if self.idx >= len(self.s):
        raise StopIteration
    for rule in self.lexer.ignore_rules:
        match = rule.matches(self.s, self.idx)
        if match:
            self._update_pos(match)
            return self.next()
    ...

CPython doesn't do tail recursion elimination, so if there's N consecutive ignorable tokens in the input, N stack frames will be consumed. If N is big enough (1000 in my reproducer), you get a recursion error. To fix this, use a while loop instead of recursion:

def next(self):
    while True:
        if self.idx >= len(self.s):
            raise StopIteration
        for rule in self.lexer.ignore_rules:
            match = rule.matches(self.s, self.idx)
            if match:
                self._update_pos(match)
                break
        else:
            break
    ...

(untested, sorry!)

alex commented 8 years ago

Ugh, right, I forgot how I implemented ignores. Will try to fix this tonight.

On Wed, Feb 24, 2016 at 9:39 AM, Jakub Wilk notifications@github.com wrote:

The relevant code is:

def next(self): if self.idx >= len(self.s): raise StopIteration for rule in self.lexer.ignore_rules: match = rule.matches(self.s, self.idx) if match: self._update_pos(match) return self.next() ...

CPython doesn't do tail recursion elimination, so if there's N consecutive ignorable tokens in the input, N stack frames will be consumed. If N is big enough (1000 in my reproducer), you get a recursion error. To fix this, use a while loop instead of recursion:

def next(self): while True: if self.idx >= len(self.s): raise StopIteration for rule in self.lexer.ignore_rules: match = rule.matches(self.s, self.idx) if match: self._update_pos(match) break else: break ...

(untested, sorry!)

— Reply to this email directly or view it on GitHub https://github.com/alex/rply/issues/52#issuecomment-188282161.

"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084

alex / rply

unbounded recursion in LexerStream.next() #52