85M lexer when using the C binding

JuPfu / sparkle-g

Automatically exported from code.google.com/p/sparkle-g

Apache License 2.0

0 stars 0 forks source link

* What steps will reproduce the problem? Generate the C lexer and parser for the Sparql.g file using the following options: options { language = C; output = AST; ASTLabelType = pANTLR3_BASE_TREE; } * What is the expected output? What do you see instead? The size of the generated files is as follows: 2.4K Sparql.tokens 85M SparqlLexer.c 30K SparqlLexer.h 1.5M SparqlParser.c 69K SparqlParser.h If you generate the Java lexer and parser instead, the files are much smaller: 2.4K Sparql.tokens 582K SparqlLexer.java 876K SparqlParser.java I'm not sure whether this is a problem with the grammar or with the antlr generators for the C language. * What version of the product are you using? On what operating system? Antlr 3.4 and Antlr 3.3 on OS X Lion 1.7.1

The increase in size for the C lexer seems to be caused by the great number of 
case independent keywords. The Sparql lexer had been designed with attention to 
language independence. Only a few Java specific snippets are embedded in the 
ANTLR lexer and grammar rules.
Attached a file, which shifts the keyword recognition into some methods written 
in Java. You will notice a drastic reduction in size of the generated files 
when compiling the grammar with Antlr. The generated C lexer size will be only 
double the size of the lexer with target language Java. The warnings "no lexer 
rule corresponding to token:" can safely be ignored!
What has to be done:
Replace the Java methods by C methods in the Sparql.g grammar.
Questions - don't hesitate to contact us.

Original comment by Juergen....@gmail.com on 17 Dec 2011 at 1:24

Changed state: Fixed

Attachments:

Sparql-C-Version-Starting-Point.zip

JuPfu / sparkle-g

85M lexer when using the C binding #4