Open tangxing806 opened 2 years ago
Have you tried to run parsing several times? As I know C++ has a very big warmup time (maybe the biggest) but generally, it works very fast.
Is there any way to avoid warmup time?
I don't think so because ANTLR uses an internal cache for effective parsing. Maybe @mike-lischke could help.
Yes, measuring a warm parser is the best option. The C++ target is orders of magnitudes faster than any other target, except on warm up.
BTW, out of curiosity, why warmup is so slow in C++?
C++ warmup occurs during static initialization of the program currently, before main
is invoked.
I would be curious to see what gperftools or valgrind show for the C++ runtime and where most of the time is spent.
C++ warmup occurs during static initialization of the program currently, before
main
is invoked.
To be more accurate, warmup occurs as the parser encounters text that matches rules not yet encountered. A good performance test is to parse the same text twice, the second pass is guaranteed to be as fast as it can be (since the first pass performed all necessary warmups).
After tweaking the cmake files for my driver generator for 4.9.3 C++ with this absolutely critical tip, I ran profiling for grammars-v4/html/examples/abc.com.html for a side-by-side comparison of Java vs C++.
Attached are the hot spot profiles from gprof for G++ and JProfile/Intellij for Java. gprof.txt jprof.txt
Not surprisingly, ParserATNSimulator.closure_() takes up most of the time. Eyeballing the source, I do see some differences:
continue
control flow here in C++ and here in java.I could go on, but clearly, it's comparing apples and oranges. The code is not doing the same computation.
An accurate way of detecting differences is by counting (with the same grammar and input) the number of calls to closure_ in Java and another target (logged by enabling some debug flags). They should be exactly the same.
Given what Eric wrote: there's not really a full warm up of a parser or lexer. As soon as you feed different input the DFA has to be updated, which is what takes so long (producing the ATN configurations, modifying the synchronized DFA cache etc.).
@KvanTTT Because the DFA cache is shared among all parser instances for a given grammar (and the same holds true for their lexers), access to it must be synchronized to avoid thread race problems (this is btw. the only place in the runtime, which is thread safe). Additionally, because of the mesh like structure of ATN configurations, the C++ target needs to use std::shared_ptr
. They are relatively slow to create and in the warmup phase many of them are created and destroyed. This all together makes the warm up phase pretty slow. And the more complex the first input is the longer it takes.
@kaby76 The closure function is part of a heavily recursive set of methods, used for the prediction algorithm (which also creates the ATN configs). And it uses shared_ptr a lot. I had figured a long time ago this to be a good candidate for optimization, but it's not easy to avoid memory leaks there and you need to know the ALL(*) algorithm pretty well to avoid breaking the prediction implementation, when you rework this.
FYI before https://github.com/antlr/antlr4/commit/18f24717c81cc48856d332e2c9492ceaaf17eb1a# ParserATNSimulator did some unnecessary std::shared_ptr
copying in a for loop, as did some other places. I do not think that commit is in 4.9.3.
@kaby76 The closure function is part of a heavily recursive set of methods, used for the prediction algorithm (which also creates the ATN configs). And it uses shared_ptr a lot. I had figured a long time ago this to be a good candidate for optimization, but it's not easy to avoid memory leaks there and you need to know the ALL(*) algorithm pretty well to avoid breaking the prediction implementation, when you rework this.
@mike-lischke I am definitely not planning on changing any of this code. But, I am interested in performance issues, more generally at the grammar level. The ATN code is tricky, which I found out having written my own backtracking parser using Antlr ATNs for computing a set of lookahead tokens analogous to your antlr4-c3 code. I am writing a transformational system for parser generator grammars--including Antlr--for analyzing and extending grammar (e.g., automatic removal of mutual left recursions, re-adding the ^/! tree construction operators to a super of Antlr4 syntax), and implementing a layer for specifying semantics.
[version] antlr 4.9.2 [file size] 113 MB [cpp-runtime spend] 156 s [java-runtime spend] 39s
I am currently using antlr 4.9.2 to develop a parser,The runtime language I choose is cpp。
At that time, when I switched to the java platform (parsr and runtime), I found that the speed increased by 4 times.
Why Cpp-runtime is 4x slower than Java-runtime?I feel weird.
finally,I want to know how to improve parsing speed in C++?
thanks~