dino-lang / dino

The programming language DINO
GNU General Public License v2.0
71 stars 5 forks source link

MSTA fails to generate correct LR(k) parsers for k>=2 #8

Open teshields opened 8 years ago

teshields commented 8 years ago

I'm (trying to) develop a grammar for Algol68 (long story) and there are now several areas in that grammar requiring look ahead of 2 and 3 tokens. I was surprised to find that msta generated code that fails to process correct inputs.

My developmental Algol68 grammar is way too large to debug, so I set up 4 small LR(2) & LR(3) grammar files (from literature) that also fail to process correct inputs, and in the case of using the msta command-line option '-minimal-error-recovery' fail to even compile.

I've managed to "fix" msta so that the 4 small grammars, as well as my much larger grammar, now successfully process correct inputs, and msta successfully passes the distribution tests. I don't claim that my "fixes" are what you should apply to the baseline.

Here is the patch (to file parser.c): parser.c.patch.txt

Here is the source for the 4 small grammars and the scripts I used to generate and compile 3 versions (default error recovery, local error recovery and minimal error recovery) samples.zip

My explanation of what I found & how I "fixed" the problems is rather long, so I've included that as an attachment as well: MSTA-BUG-NOTES.txt

My environment: Mac OS X El Capitan Version 10.11.6 MacPorts Version 2.3.4 dino Version of 2016-06-03 compiled with MacPorts gcc 5.4.0 parser code compiled with Apple LLVM clang Version 703.0.31

teshields commented 8 years ago

Somehow, my entire issue was deleted. I'll try to recreate it.

teshields commented 8 years ago

I was surprised to find that MSTA generates parser code that fails on (some) correct inputs, and for command-line option "-minimal-error-recovery" generates parser code that even fails to compile.

I've included notes on my analysis of the multiple problems, and my "fixes", as well as the 4 small LR(2) and LR(3) grammars that I used (from literature) along with the scripts to generate & compile 3 variants of each for "msta-orig" (distribution version) and "msta" (my patched version).

Environment: Max OS X El Capitan Version 10.11.6 MacPorts Version 2.3.4 MacPorts gcc Version 5.4.0 used to compile dino Version 2016-06-03 Xcode clang Version 703.0.31 used to compile samples

MSTA-BUG-NOTES.txt parser.c.patch.txt samples.zip

vnmakarov commented 8 years ago

Thank you for reporting this issue. I see all your three messages. I'll work on the issue later (probably in month or two) as MSTA is a low priority application for me. I did not touch MSTA for very long time. I can say that long time ago MSTA was tested on non-public set of tens of LR(k) grammars with k > 1. Probably some later change resulted in the current failures. Thanks for the tests. I'll add them to MSTA testsuite when the issue is resolved.

difranco commented 7 years ago

Regarding your comment above @vnmakarov, "MSTA is a low priority application for me," is this because it has been superseded by YAEP, or for another reason?

vnmakarov commented 7 years ago

Regarding your comment above @vnmakarov, "MSTA is a low priority application for me," is this because it has been superseded by YAEP, or for another reason?

Yes, YAEP is more interesting to me than in MSTA. But even for YAEP I have no time. I am too busy with my job these days (I have a lot of code in GCC which I should maintain). May be after April (when GCC7 will be released) I'll have more time to address your issue. Sorry.