JuPfu / sparkle-g

Automatically exported from code.google.com/p/sparkle-g
Apache License 2.0
0 stars 0 forks source link

85M lexer when using the C binding #4

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
* What steps will reproduce the problem?

Generate the C lexer and parser for the Sparql.g file using the following 
options:

options {
    language = C;
    output = AST;
    ASTLabelType = pANTLR3_BASE_TREE;
}

* What is the expected output? What do you see instead?

The size of the generated files is as follows:

2.4K Sparql.tokens
85M SparqlLexer.c
30K SparqlLexer.h
1.5M SparqlParser.c
69K SparqlParser.h

If you generate the Java lexer and parser instead, the files are much smaller:

2.4K Sparql.tokens
582K SparqlLexer.java
876K SparqlParser.java

I'm not sure whether this is a problem with the grammar or with the antlr 
generators for the C language. 

* What version of the product are you using? On what operating system?

Antlr 3.4 and Antlr 3.3 on OS X Lion 1.7.1

Original issue reported on code.google.com by todor.di...@me.com on 20 Aug 2011 at 1:25

GoogleCodeExporter commented 9 years ago
The increase in size for the C lexer seems to be caused by the great number of 
case independent keywords. The Sparql lexer had been designed with attention to 
language independence. Only a few Java specific snippets are embedded in the 
ANTLR lexer and grammar rules.
Attached a file, which shifts the keyword recognition into some methods written 
in Java. You will notice a drastic reduction in size of the generated files 
when compiling the grammar with Antlr. The generated C lexer size will be only 
double the size of the lexer with target language Java. The warnings "no lexer 
rule corresponding to token:" can safely be ignored!
What has to be done:
Replace the Java methods by C methods in the Sparql.g grammar.
Questions - don't hesitate to contact us.

Original comment by Juergen....@gmail.com on 17 Dec 2011 at 1:24

Attachments: