Store terminal failures as TerminalParseFailures

cjheath / treetop

A Ruby-based parsing DSL based on parsing expression grammars.

https://cjheath.github.io/treetop

MIT License

306 stars 22 forks source link

Store terminal failures as TerminalParseFailures #26

Closed sds closed 9 years ago

sds commented 9 years ago

When attempting to print out a failure message using failure_reason for my tiny personal project propose[1], I was encountering the following error:

NoMethodError: undefined method `unexpected' for [0, "[a-z]", false]:Array> with backtrace

This started happening after upgrading to treetop 1.6.0+.

It looks like the problem here is that terminal failures are stored in @terminal_failures as tuples, and when you call terminal_failures it does a check to see if the first item is a TerminalParseFailure, and converts the list if it isn't. However, if you call terminal_failures multiple times, it's possible other errors were added, and so you'll have a mixture of Array tuples and TerminalParseFailures.

The fix is to store them as TerminalParseFailures when the failure is first encountered in terminal_parse_failure. It's not overly expensive and allows us to avoid this unnecessary complexity.

[1] https://github.com/sds/propose

sds commented 9 years ago

Looks like @terminal_failures is already cleared on each run (https://github.com/cjheath/treetop/blob/e30bd4868021c289fae6f5a1dc1f5b2d197c72ab/lib/treetop/runtime/compiled_parser.rb#L74).

I'm honestly not sure what I'm doing to cause this behavior. FWIW, I think this change makes sense even if this weren't an issue, since it seems a bit odd to store parse errors this way, and this pull request actually reduces complexity/LOC IMHO.

If you're still opposed to merging, I'll try to find some time later next week to dig deeper.

cjheath commented 9 years ago

Not opposed, just want to understand so as to avoid fixing what's not broken. In the early days, TerminalParseFailures were being created for every failed terminal, now we only create them when we're at the right-most point we've reached in the input, and that saved a huge amount of time and memory. Now it's creating the Array tuple and converting it later, which probably isn't really a saving; except perhaps that Arrays are optimised in the Ruby runtime and might be quicker to create than the TPF. I don't know, as AFAIK there's been no comparative performance testing done in some years and a lot of things have changed in the Ruby runtimes.

I hope that explains the (historical) rationale anyhow. Whatever makes sense now will be acceptable, as long as we understand why the change is needed. So yes, if you could find time to investigate, I'd like to know.

sds commented 9 years ago

Sorry for the delay in following up on this issue.

I've dug around and it's clear that there are multiple invocations of terminal_failures happening within the code generated by Treetop during the course of the parse phase. I only call Treetop::Runtime::CompiledParser#failure_reason once (confirmed in the debugger) and yet the call to terminal_failures which converts the array of tuples is happening before that call (and thus the array ends up being a mixture of TerminalParseFailure instances and Array tuples, as described above).

If you look at Treetop::Compiler::Predicate or Treetop::Compiler::Repetition, you can clearly see terminal_failures.pop calls being generated in the code. These calls trigger the premature conversion of the array of tuples into an array of TerminalParseFailure instances, resulting in the problem described above.

Does that make sense? It seems like we can avoid the confusion by always storing an actual TerminalParseFailure instance instead of the current lazy conversion procedure, which this pull request addresses.

Interested in hearing your thoughts.

cjheath commented 9 years ago

Thanks for investigating. There is no need to call a method in order to pop terminal_parse_failures (which is needed in semantic predicates and in repetition node types). I changed the code generation to just pop @terminal_parse_failures directly to restore the original side-effect-free intention.

sds commented 9 years ago

Great, that solved the problem. Thanks!