antlr / antlr3

antlr v3 repository (pulled from p4 with history from //depot/code/antlr/antlr3-main
http://www.antlr.org
241 stars 170 forks source link

Using named subrule invocations and rewrite=true in tree rewriting grammar causes RewriteEmptyStreamException #128

Open picomancer opened 10 years ago

picomancer commented 10 years ago

I encountered this bug writing a compiler with Antlr 3.5. My compiler consists of three grammars:

  1. A parser with output=AST to read the input
  2. A tree grammar to reorder the nodes such that the walker does everything in the correct order
  3. A walker to visit all the nodes and emit code

For example in the assignment statement a=b, phase 2 reorders the operands of a node like (= a b) to (= b a). This way phase 3 will be able to "produce b" then "store it in a".

Whether this is a good design is not the topic of this post. Rather, I discovered that a very minimal implementation of this design reveals a bug in ANTLR!

I think the compilation of the following tree grammar is somehow wrong:

tree grammar bugTreeRewriter;

options
{
    ASTLabelType=CommonTree;
    tokenVocab=bug;
    output=AST;
    rewrite=true;
}

prgm : ^(PRGM stmt*) ;

stmt : ^(OP_ASSIGN a=expr b=expr) -> ^(OP_ASSIGN $b $a) ;

expr : ID | INT ;

To aid in reproducing, I've uploaded complete code (including a minimal driver program) to https://github.com/picomancer/antlr-bugs

parrt commented 10 years ago

how is it wrong? compile error? wrong tree?

On Mon, Sep 30, 2013 at 7:03 PM, picomancer notifications@github.comwrote:

I encountered this bug writing a compiler with Antlr 3.5. My compiler consists of three grammars:

(1) A parser with output=AST to read the input

(2) A tree grammar to reorder the nodes such that the walker does everything in the correct order

(3) A walker to visit all the nodes and emit code

For example in the assignment statement a=b, phase (2) reorders the operands of a node like (= a b) to (= b a). This way phase (3) will be able to "produce b" then "store it in a".

Whether this is a good design is not the topic of this post. Rather, I discovered that a very minimal implementation of this design reveals a bug in Antlr!

I think the compilation of the following tree grammar is somehow wrong:

tree grammar bugTreeRewriter;

options { ASTLabelType=CommonTree; tokenVocab=bug; output=AST; rewrite=true; }

prgm : ^(PRGM stmt*) ;

stmt : ^(OP_ASSIGN a=expr b=expr) -> ^(OP_ASSIGN $b $a) ;

expr : ID | INT ;

To aid in reproducing, I've uploaded complete code (including a minimal driver program) to https://github.com/picomancer/antlr-bugs

— Reply to this email directly or view it on GitHubhttps://github.com/antlr/antlr3/issues/128 .

Dictation in use. Please excuse homophones, malapropisms, and nonsense.

picomancer commented 10 years ago

how is it wrong?

Running the bugTreeRewriter with rewrite=true causes it to throw an exception. Here's the output when using rewrite=true in bugTreeRewriter.g:

+ rm -Rf output
+ java -jar antlr.jar -o output bug.g
+ java -jar antlr.jar -o output bugTreeRewriter.g
+ ln -s ../Main.java output/Main.java
+ ln -s ../program.txt output/program.txt
+ cd output
+ javac -cp ../antlr.jar bugLexer.java bugParser.java bugTreeRewriter.java Main.java
+ java -cp ../antlr.jar:. Main
[@0,0:0='\n',<8>,channel=99,1:0]
[@1,1:1='a',<4>,2:0]
[@2,2:2=' ',<8>,channel=99,2:1]
[@3,3:3='=',<6>,2:2]
[@4,4:4=' ',<8>,channel=99,2:3]
[@5,5:5='b',<4>,2:4]
[@6,6:6=';',<9>,2:5]
[@7,7:7='\n',<8>,channel=99,2:6]
[@8,8:8='c',<4>,3:0]
[@9,9:9=' ',<8>,channel=99,3:1]
[@10,10:10='=',<6>,3:2]
[@11,11:11=' ',<8>,channel=99,3:3]
[@12,12:12='d',<4>,3:4]
[@13,13:13=';',<9>,3:5]
[@14,14:15='\n\n',<8>,channel=99,3:6]
[@15,16:16='<EOF>',<-1>,5:0]
ast: (PRGM (= a b) (= c d))
Exception in thread "main" org.antlr.runtime.tree.RewriteEmptyStreamException: rule b
    at org.antlr.runtime.tree.RewriteRuleElementStream._next(RewriteRuleElementStream.java:158)
    at org.antlr.runtime.tree.RewriteRuleElementStream.nextTree(RewriteRuleElementStream.java:145)
    at bugTreeRewriter.stmt(bugTreeRewriter.java:227)
    at bugTreeRewriter.prgm(bugTreeRewriter.java:104)
    at Main.main(Main.java:30)

When rewrite=true is commented out, the output becomes what I expect:

+ rm -Rf output
+ java -jar antlr.jar -o output bug.g
+ java -jar antlr.jar -o output bugTreeRewriter.g
+ ln -s ../Main.java output/Main.java
+ ln -s ../program.txt output/program.txt
+ cd output
+ javac -cp ../antlr.jar bugLexer.java bugParser.java bugTreeRewriter.java Main.java
+ java -cp ../antlr.jar:. Main
[@0,0:0='\n',<8>,channel=99,1:0]
[@1,1:1='a',<4>,2:0]
[@2,2:2=' ',<8>,channel=99,2:1]
[@3,3:3='=',<6>,2:2]
[@4,4:4=' ',<8>,channel=99,2:3]
[@5,5:5='b',<4>,2:4]
[@6,6:6=';',<9>,2:5]
[@7,7:7='\n',<8>,channel=99,2:6]
[@8,8:8='c',<4>,3:0]
[@9,9:9=' ',<8>,channel=99,3:1]
[@10,10:10='=',<6>,3:2]
[@11,11:11=' ',<8>,channel=99,3:3]
[@12,12:12='d',<4>,3:4]
[@13,13:13=';',<9>,3:5]
[@14,14:15='\n\n',<8>,channel=99,3:6]
[@15,16:16='<EOF>',<-1>,5:0]
ast: (PRGM (= a b) (= c d))
rewritten_ast: (PRGM (= b a) (= d c))