erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.79k stars 126 forks source link

Simplify reference resolution. #205

Closed lucaswiman closed 2 years ago

lucaswiman commented 2 years ago

Followup to #141, simplifying lazy reference resolution. The key insight here is that either:

  1. Every lazy reference can be resolved to an Expression just from the original grammar definition, or
  2. There is a circular reference chain of lazy refs in the original definition.

What was happening prior to #141, is that lazy references were only being resolved one level deep, which (depending on ordering) stuck more lazy references in child nodes, which then need to be fixed again. So e.g. in Erik's test case reproducing the issue, it contains the lines:

            _ = meaninglessness*
            meaninglessness = whitespace

Since the are resolved in order, the first rule resolves to _ = <LazyReference to whitespace>*, then the second rule resolves correctly, but it's already too late.

The fix in #141 was to basically check the entire grammar every time a new lazy reference was resolved, which works, but is inefficient and can be simplified. This PR just follows each lazy reference down the chain of references whenever we need to resolve it.

Note

The goal in invalidating grammars with circular reference chains isn't to invalidate all circular references (though that's maybe a good idea if it's possible), just to those whose lazy references cannot be resolved. So e.g. the following grammar is circular, but its lazy references can all be resolved down to concrete expressions (Sequences whose members reference each other):

foo = bar ""
bar = "" foo

However, the lazy references can never be resolved in a grammar like the following:

foo = bar
bar = foo

There it's lazy refs all the way down.


cc @righthandabacus

righthandabacus commented 2 years ago

I love this. Much more logical and less hacky.