erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.8k stars 126 forks source link

TokenMatcher exception #173

Closed PixelRick closed 2 years ago

PixelRick commented 3 years ago

TokenMatcher._uncached_match has different behavior than Literal._uncached_match regarding overread. Literal's one returns None on overread thanks to its use of startswith. TokenMatcher's one generates an IndexError: list index out of range.

To reproduce the exception the example below uses a OneOrMore expr that calls TokenMatcher._uncached_match with pos >= len(token_list):

s = [Token('token1'), Token('token2')]
grammar = TokenGrammar("""
            foo = token1 "token2"+
            token1 = "token1"
            """)
grammar.parse(s)

Simple fix is to check pos < len(token_list):

    def _uncached_match(self, token_list, pos, cache, error):
        if pos < len(token_list) and token_list[pos].type == self.literal:
            return Node(self, token_list, pos, pos + 1)
erikrose commented 3 years ago

That’s an excellent big report! Thanks!

PixelRick commented 3 years ago

No big deal. Thanks for the lib it is a real gem, compact and effective.