Closed arifogel closed 10 years ago
I haven't run the performance analysis on it, but I did notice that on the following line, it appears that 404.0
would not be recognized as a FLOAT
.
I believe the rule was meant to be structured as (note the new location of the *
character):
FLOAT
: {!inComment}? F_PositiveDigit F_Digit* '.' F_Digit+
;
Performance of the lexer would be improved if the following predicate and character were reversed:
As written, the predicate will be evaluated for every letter, digit, or symbol appearing in a VARIABLE
. If you instead write ':' {!enableIPV6_ADDRESS}?
, the predicate will only be evaluated if/when a :
character appears in the variable.
The other semantic predicates in the lexer are not as intrusive, but they could still be improved.
For example, {foo}? F_Digit+
is more efficiently written as either F_Digit+ {foo?}
or F_Digit {foo}? F_Digit*
.
When performance is a concern, avoid using non-greedy operators, especially in parser rules.
First of all, I wanted to thank you for your quick response, and to apologize for taking so long to get back to you. I implemented your suggested changes (except for the FLOAT thing), and did not notice any improvement in performance. I should point out that lexing is not the bottleneck, so I doubt any changes to the lexer will make much difference, unless they serve to reduce the number of tokens in the resulting stream. At this point I would like to try more complex methods for analyzing the performance issues, e.g. profiling, but I do not know where to start. Can you suggest next steps?
I was able to build the project in your GitHub repository. Can you provide a sample method which I can use to run a test which demonstrates the slow behavior?
The build instructions in the README are far more than what is necessary for you to do to just test the parser. Try the following steps:
If you want to use an IDE (e.g. Eclipse) to debug, the 'main' function is in 'Driver.java'. Use the same command line options as in step 4.
Edit:
To build in Eclipse you must import the project located in
On 08/03/2014 07:20 PM, Sam Harwell wrote:
I was able to build the project in your GitHub repository. Can you provide a sample method which I can use to run a test?
— Reply to this email directly or view it on GitHub https://github.com/antlr/antlr4/issues/658#issuecomment-51011927.
Oh wow, I totally misread that as you saying you were UNABLE to build my project :P
On August 3, 2014 7:20:39 PM PST, Sam Harwell notifications@github.com wrote:
I was able to build the project in your GitHub repository. Can you provide a sample method which I can use to run a test?
Reply to this email directly or view it on GitHub: https://github.com/antlr/antlr4/issues/658#issuecomment-51011927
I ran the test as you described, and observed that the parser is reporting a large number of syntax errors while parsing the files. Is this what you are observing as well?
Nevermind, I needed to update the lexer to support Windows line endings.
fragment
F_Newline
:
'\n'
| '\r' '\n'?
;
I recorded a 24.828 sec execution of Batfish.parseVendorConfigurations
(I added a loop around the code which parses input files). Of that, 24.406 sec. was spent in Lexer.nextToken
. I would say you most certainly have a lexer problem.
If you get rid of semantic predicates in your lexer, and replace them with use of lexer modes, your performance problem is completely resolved. Prior to that, you can get an approximate 10:1 performance improvement by adjusting each lexer rule which contains a semantic predicate at the beginning of the rule to instead contain the predicate after a character has already been matched. For example, the following rule:
FOO
: {stuff}? DIGIT+
;
Would instead be written like this:
FOO
: DIGIT {stuff}? DIGIT*
;
I followed your suggestions, and the speed improved greatly. Thank you for your help. On 08/04/2014 02:24 PM, Sam Harwell wrote:
If you get rid of semantic predicates in your lexer, and replace them with use of lexer modes, your performance problem is completely resolved. Prior to that, you can get an approximate 10:1 performance improvement my adjusting each lexer rule which contains a semantic predicate at the beginning of the rule to instead contain the predicate after a character has already been matched. For example, the following rule:
FOO : {stuff}? DIGIT+ ; Would instead be written like this:
FOO : DIGIT {stuff}? DIGIT* ; — Reply to this email directly or view it on GitHub https://github.com/antlr/antlr4/issues/658#issuecomment-51119759.
I started with an ANTLR 3 grammar (with several subordinate grammars) for parsing Cisco configuration files. It usually gets through all of my test files instantaneously from a user perspective. In the last couple weeks I ported to ANTLR 4, and am now using 4.3 specifically. I did end up making some structural changes to the grammar(s), but nothing too crazy. Now it takes from .5s to 1s per file for the parsing stage (calling the
<parser>.<start_rule>()
function). I tried setting prediction mode to SLL, but it makes no difference. I'd be happy to provide more information if requested, but I don't know where to start.Please help me to optimize parsing performance for my grammar. The various versions of the grammar are in my github repository (arifogel/batfish). You can look at the .g files in the batfish.grammar.cisco package in the initial commit of the master branch for the v3 version. The .g4 files in that package in the latest commit contain my v4 grammars.
Thanks for any help you can provide.