erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.79k stars 126 forks source link

Binary grammars and string literal precedence fix #202

Closed lucaswiman closed 2 years ago

lucaswiman commented 2 years ago

Changes

Fix precedence of modified string literals

Fixes #201, wherein the precedence of modified strings like r"..." was wrong, causing those strings to be parsed as a reference r followed by a string literal. This is technically a breaking change, since grammars like the following are no longer valid:

foo = baz"bar"
baz = "baz"

However, the fix is fairly straightforward (add some spaces), and this was probably a fairly rare occurrence anyway.

I'm guessing this was not previously caught because modifiers were only useful for ~r"regex nodes", where the precedence was correct. However, modified string literals are required for next feature.

Support for parsing binary files

This turned out to be much easier than I'd anticipated because of Erik’s clever use of ast.literal_eval to evaluate string literals.

Once you can define a bytes literal in a grammar, everything "just works" because at base, parsimonious is just calling .startswith(...), .endswith(...) and re.match, all of which work fine as long as the arguments' types match (str xor bytes).

To make this feature easier to use, I added a validation that all string literals (and by extension regexes) must be of the same type.

I added some documentation for the feature, but I'm happy to add to another section if you think it's warranted.

Testing

All of the changes are tested by unit tests asserting: