Change in error handling between 0.4.18-153 and 0.4.18-154

nwhetsell commented 7 years ago

It looks like there was a change in error handling between 0.4.18-153 and 0.4.18-154. I’m not sure if I’m doing something wrong, or if this is unexpected and I should try to reduce this.

I use Jison in linter-csound. On macOS, if I enter in Terminal

git clone https://github.com/nwhetsell/linter-csound.git
cd linter-csound/lib/csound-parser
npm install https://github.com/GerHobbelt/jison/archive/0.4.18-153.tar.gz
../../node_modules/jison-lex/cli.js preprocessor.jison-lex --outfile preprocessor.js
../../node_modules/jison/lib/cli.js orchestra.jison orchestra.jison-lex --outfile orchestra-parser.js
npm install csound-api
npm --global install jasmine
jasmine

to generate parsers and run some tests with Jison 0.4.18-153, all the tests pass. (If you want to run this, note that the csound-api package requires Boost and Csound. Here are installation instructions.)

If I then run

npm uninstall jison
npm install https://github.com/GerHobbelt/jison/archive/0.4.18-154.tar.gz
../../node_modules/jison-lex/cli.js preprocessor.jison-lex --outfile preprocessor.js
../../node_modules/jison/lib/cli.js orchestra.jison orchestra.jison-lex --outfile orchestra-parser.js
jasmine

to generate parsers and run some tests with Jison 0.4.18-154, I get one failure.

The failure appears to have something to do with using this grammar rule—

then_statement
  : THEN NEWLINE statements
    {
      $$ = new Then(@$, {children: $3});
    }
  | THEN NEWLINE
    {
      $$ = new Then(@$);
    }
  | THEN error
    {
      $$ = new Then(@$);
      parser.messages.push({
        type: 'Error',
        text: 'Expected newline',
        range: parser.lexer.rangeFromPosition(@1.last_line, @1.last_column)
      });
    }
  ;

—to parse this snippet:

      if 1 == 1 then + -
      endif

In Jison 0.4.18-154, it seems like an extra exception is thrown due to the error token in this grammar rule:

primary_expression
  : identifier
  | global_value_identifier
  | constant
  | '(' conditional_expression ')'
    {
      $$ = $2;
    }
  | error
    {
      parser.parseError('', {}, {
        type: 'Error',
        text: 'Expected expression',
        range: parser.lexer.rangeFromPosition(@$.first_line, @$.first_column)
      });
    }
  ;

Any ideas?

GerHobbelt commented 7 years ago

No solid ideas now, off the top of my head.

You might want to check behaviour of the latest release as I /do/ recall I had bugs in the error recovery code a while back and this sounds very much like the bug I had in error recovery code.

nwhetsell commented 7 years ago

I should’ve mentioned that I get the same behavior (that is, the same test failure) in Jison 0.4.18-160. I’ll try to reduce this.

GerHobbelt commented 7 years ago

Maybe an odd question but isn't this correct behaviour: first error is missing a newline after 'then', from which we recover. Then we get a statement '+ -', that is itself an invalid statement, hence a second error report coming from the other 'error' recovery rule. Hence an expected total of two errors.

This precise behaviour (due to the nearness of the two errors) may subtly differ from the way GNU bison might behave (haven't checked); at closer inspection now this however would be expected/desired behaviour for me.

At least /if/ the two errors are caused the way I describe here, i.e. both errors should point at different positions in the parsed input (different tokens which trigger the errir recovery process in the parser).

On Dec 22, 2016 8:50 PM, "Nate Whetsell" notifications@github.com wrote:

I should’ve mentioned that I get the same behavior (that is, the same test failure) in Jison 0.4.18-160. I’ll try to reduce this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GerHobbelt/jison/issues/3#issuecomment-268876603, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYkHnrAxrdnuChm8wuByXTMFrxVrDRsks5rKtRpgaJpZM4LSMpM .

GerHobbelt commented 7 years ago

Did a compare between b153 and b154 and looks like your grammar triggered a bugfix which was long overdue AFAICT:

SHA-1: 39886a4525895b2351e3ff59a735a8b9c02daaf3

BUGFIX: further work done on findingDefaults() as it turns out some grammars are produced with multiple symbols having the same action+newState combo listed in some rows:

Another 'default' is when all listed terminals all point to the exact same reduce state; only this time we are careful about the TERROR symbol as a state carrying that one is an explicitly encoded error recovery rule and should remain as-is.

The story is a tad complicated, as this is not a straight fix for vanilla jison, but rather came about after this next commit of mine did work on the calculation of 'default action' rows in the LR parser table:

SHA-1: c10a384014f98fc965b2481feebb974d27cf8b3a

BUGFIX/FEATURE: delay fetching look-ahead as long as possible, just like bison. This means that 'epsilon' rules from now on don't need to fetch a lexer token to reduce if the parser core didn't fetch a new symbol yet.

Bottom line: the test-epsilon-rules-early-reduce.jison example will pass muster as now lexer invocation will wait until after the initial epsilon rules have executed, allowing us to code the same type of 'lexer hacks' in jison as were previously possible in yacc/bison, where the 'lexer hack' would modify the lexer before it would be invoked to produce further tokens.

Only tested for epsilon rules, but I expect it will work across the board for all rules which don't need any look-ahead to reduce as those rules all should land in the 'defaultAction' table now!

This change had a bug of its own, which was also fixed in the commit mentioned further above.

The key (for your grammar) is that the top commit did fix an (yet unregistered) issue in jison where error recovery rules in the same LR state as an otherwise default action (i.e. in a state where all valid look-ahead tokens lead to the same next state would 'disappear' a.k.a. 'become unreachable': your statement error recovery rule would therefor never fire -- assuming I analyzed both your grammar and my own code correctly tonight. ;-)

Sooooo.... the new behaviour is correct while the old behaviour is wrong and due to a long-standing bug in jison. Not 100% sure whether it exists in vanilla jison as well, since quite a bit of code has changed, but I'm at least 99.9% sure it does exist there as well as this is a bug in the 'find default action rows' code in jison itself and that stuff hasn't changed significantly from vanilla before I did the work in the commits listed above.

GerHobbelt commented 7 years ago

A bit more on this:

The 0.1% chance this doesn't exist in vanilla jison is hinging on the yes-or-no whether vanilla jison would see that particular parser state as 'default' as vanilla and me differ due to this line:

gotos[go] = true;

as I look for 'default action' for any state where all valid look-ahead tokens for a given state all point to the same subsequent parser state (while 'reducing' a grammar rule), ...

... while vanilla looks for 'default action' for any state where only /one/ valid look-ahead token exists for a given state, which points to a subsequent parser state (while 'reducing' a grammar rule). This is due to the way findDefaults() increments the internal counter i for vanilla, compared to me.

Both vanilla and me have the same goal: finding states which do not need any look-ahead token inspection to know which state we'll go next. I'm casting my nets wider because I found jison couldn't give me the behaviour I was used to from bison/yacc, where you can code 'lexer hacks' in your grammar rules provided you know when a grammar rule X does not need any look-ahead to reduce, so that your 'lexer hack'coded in your rule action code chunk executes before the lexer is requested to provide the next token. Bison and yacc (and btyacc and byacc and ...) have this capability while jison had not, at least until the aforementioned changed were made.

The wicked bit in that 0.1% is: does your grammar, at this point in the parse, have multiple look-ahead terminal tokens which all point to the next, same, state, or is there only one? I haven't looked that closely at your grammar, but my hunch is that there are many instead of just one, in which case the bug wouldn't surface in vanilla jison for your grammar! (Since vanilla counts every individual token, while I could every individual goto state in findDefaults().)

For verification of what I'm saying here and deeper inspection, compare the implementation of findDefaults() in jison.js in both vanilla jison and mine (b154 or later).

Long story short: I find more 'default action' states and also check whether error recovery is active for the given state, in which case the 'default action' cannot be -- since the error recovery state is inherently different from the state any valid FOLLOW token would GOTO.

One can debate the "0.1%" but since your grammar changed its behaviour from b153 to b154, where b153 still essentially has the vanilla-style findDefaults(), my expectation is the b153 and vanilla behave the same for you (and are buggy regarding the error recovery rule for your statement errors like your example input "+ -"). [Edit: this contradicts my hunch from 3 paragraphs earlier. So that hunch is most probably wrong, at least it is in my mind by the time I was writing this paragraph and should have said as much to prevent confusion.]

Conclusion: The earlier analysis holds AFAICT: two errors in your input, and both end up firing the two different error recovery rules in your grammar, correctly since b154.

nwhetsell commented 7 years ago

Many, many thanks for your detailed response. I’m closing this, because I think the main issue at this point is that I have significant gaps in my understanding of how error recovery in Bison/Jison actually works.

GerHobbelt commented 7 years ago

FWIW: as you have written tests for your grammar (:+1: :+1:) make sure every error recovery rule in your grammar gets hit by at least one test case, while such test input should ideally at least travel through some (preferably all) other rules of your grammar which also have error recovery alternatives, which are or are not fired in the test case.

Your original input example is a fine test as it tests grammar behaviour post the first error recovery, not just whether initial error recovery occurs. (Such test cases are hard to construct intentionally and are often more complicated than the simplest unit tests. They are more like 'system tests', in a way.)

You may also want to check out yyerrok and yyclearin in the yacc/bison documentation when you go digging deeper into error recovery behaviour. My jison fork also supports those, but they're pretty hairy to use, so I would suggest to develop any error recovery coding in small steps (and with lots of tests; I screw up almost daily myself as yacc error recovery coding is almost an art). 👿

There's several error recovery test examples in the jison/examples/ directory, BTW.

Take care and good luck!

GerHobbelt / jison

Change in error handling between 0.4.18-153 and 0.4.18-154 #3

SHA-1: 39886a4525895b2351e3ff59a735a8b9c02daaf3

SHA-1: c10a384014f98fc965b2481feebb974d27cf8b3a