Closed gapag closed 5 years ago
I think the problem is not so much the order of the rules but the entry points (wich, if unspecified, will be the first category). Many backends don't seem to support having a list as an entry point.
Maybe entry points could be handled in the general layer rather in the backends.
In the generated cup file I find:
start with [Exp];
ListExp ::= ...
It seems that the start non-terminals are not translated.
The botched entrypoint in the cup file is easily fixed. (Use identCat
instead of show
.)
For the OCaml backend, just the generated Test program was faulty.
The Haskell and C backends print a parsed top-level list backwards. (Not an issue with Java and CPP.) The reason is that e.g. Haskell generates left-recursive Happy rules for lists, generating snoc-lists. These are reversed when plugged into other AST nodes, but not at the top-level. A more principled solution may be to have rules for snoc-lists and then rules for the corresponding cons-lists that just apply the reversal function.
Reversed list printing in Haskell with this test case:
terminator Exp "";
Lit. Exp ::= Integer ;
I could not reproduce the problem in the simpler setting:
terminator Integer "";
It seems that then the right-to-left recursion transformation does not kick in.
UPDATE: fixed for Haskell by removing the right-to-left recursion transformation, but still open for C.
That left recursion for LR parsing is strictly more performant than right recursion is not a categorical truth. It is true that left recursion uses O(1) stack and right recursion O(n). However, the AST/parsetree stored in the single cell in case of left recursion is of size O(n) where each stack entry in case of right recursion has size O(1). So it is O(n) no matter which one is uses.
Happy uses a heap-allocated parser stack and BNFC-generated parsers produce ASTs, thus, left recursion does not save anything in comparison to right recursion.
Bison has hard stack limits (10000 if not set otherwise), thus, left-recursion may be safer. For instance, the bnfc -c
generated bison parser for input separator [Integer] "";
reports
error: 9999,0: memory exhausted at 9999
We might thus remove the optimization for the Haskell backend, yet keep it for C.
See also section "A few words on performance" of blog http://gallium.inria.fr/blog/lr-lists/
C++ gets this list reversed:
separator Integer "";
ANTLR seems to have a bug blocking this issue for the Java/ANTLR backend: https://github.com/antlr/antlr4/issues/2689
The .g4
parser specification produced by BNFC looks right.
Since #272 is fixed this issue should be resolved completely.
BNFC release 2.8, and up to the latest commit. The Java backend is sensitive to the order of the macros/productions. For instance the following gives errors, both using CUP and ANTLR4:
Instead
seems to work fine. I did not try all the other backends, but Haskell seems to work without problem with both formulations.