kaby76 commented 3 years ago

Using dotnet-antlr, I'm in the process of cleaning up the grammars-v4 repo. But, I decided to test the performance of Java, C#, Dart, Go, JavaScript, and Python3 4.9.1 generated parsers against the first six examples in https://github.com/antlr/grammars-v4/tree/master/html/examples (AMD Ryzen 2200G, 16G, B450M DS3H-CF, SATA-III SSD, Win10, Msys2). Here are the results ("minutes : seconds . fraction-of-seconds").

	Dart	C#	Java	JavaScript	Go	Python3
Test name/Implementation details	dart compile exe cli.dart	Release		Node.js	go build
abc.com.html	00:22.0	00:11.2	00:07.3	01:03.2	00:08.8	04:25.2
aljazeera.com.html	01:44.3	00:47.9	00:27.5	04:33.3	01:52.8	19:14.6
antlr.html	00:02.5	00:01.4	00:02.2	00:07.3	00:06.2	00:28.8
attvalues.html	00:00.1	00:00.2	00:00.3	00:00.3	00:00.1	00:00.2
bbc.html	02:10.3	01:03.2	00:35.6	06:09.0	00:31.9	26:29.9
cnn1.html	04:04.2	02:09.3	01:09.5	12:26.7	28:11.7	52:27.6

One takeaway is that for some targets, the parse time is not a constant over the parse time for Java, which is generally the fastest in performance.

--Ken

ericvergnaud commented 3 years ago

Hey, great, very useful. quick question, does your test go through a warmup?

kaby76 commented 3 years ago

No, it doesn't do that. That would be a good thing to add. I will do that today or tomorrow and give a new table. I am so glad I wrote this tool--it's turning out to be very useful.

mike-lischke commented 3 years ago

The results are interesting and a bit surprising. Hard to believe that Java is faster than C#. You should definitely add a warmup step! Results will differ greatly, I promise.

What would be interesting here also is C++. I have compared only Java and C++ in the past and Java was always orders of magnitutes slower than C++.

KvanTTT commented 3 years ago

Yes, Java is faster than C#, see also my twitter thread about runtimes comparison: https://twitter.com/KvanTTT/status/1304095033919000576 and results: https://github.com/KvanTTT/AntlrBenchmarks Warmup step is also considered.

mike-lischke commented 3 years ago

OK, just for fun I created a test project (C++17) to parse the example HTML files with the HTML grammar from the grammar directory and found some interesting results. Here are the files I actually could parse:

Name	Cold	Warm 1	Warm 2
abc.com.html	16.292 ms	0.058 ms	0.01 ms
attvalues.html	42.022 ms	0.013 ms	0.009 ms
gnu.html	6.329 ms	0.011 ms	0.009 ms
uglylink.html	6.467 ms	0.01 ms	0.009 ms
bbc.html	27.834 ms	0.018 ms	0.023 ms
freebsd.html	9.433 ms	0.011 ms	0.009 ms
google.html	36611.6 ms	0.011 ms	0.009 ms
script1.html	0.146 ms	0.01 ms	0.009 ms
wikipedia.html	20.003 ms	0.01 ms	0.009 ms
antlr.html	10951.4 ms	0.012 ms	0.028 ms
html4.html	0.217 ms	0.01 ms	0.009 ms
style1.html	0.136 ms	0.01 ms	0.009 ms

All the other files in that folder (e.g. reddit, digg) could not be parsed. Actually I stopped parsing if the warmup phase took longer than 1-2 minutes.

Some key take aways: 1) C++ is probably the most extreme example for differences between warm and cold runs. For example google.html took 36 s for the cold run and then only ~9 µs for repeated parsing. 2) The C++ runtime is usually at least 100 times faster than any other runtime. However, that claim needs to be checked again, once we have numbers from warm parser runs in the other runtimes. And even then it might only apply to the HTML grammar. 3) The HTML grammar seems to be an extreme case. I have not seen cold runs in the minutes range for C++ before. And the files are pretty small (166KB and < 4000 lines). But HTML is also special as it actually includes at least 3 languages (HTML, JS, CSS). Still, it might be possible to improve it yet.

Update

My HW specs: Mac Pro (end 2013), 32GB RAM DDR 3 ECC, 3.5GHz 6 core Intel Xeon E5, 500GB SSD 5.0 GT/s, macOS 11.2 Big Sur.

kaby76 commented 3 years ago

OK, just for fun I created a test project (C++17) to parse the example HTML files

Mike, Could you attach a copy of the zip of the directory containing the source and the build files (I assume for cmake)? First, I can use that to model what I need to generate for the C++ driver code from the .pom's. It will also give me an idea of what "warm-up" means. I implemented a program that does something like this, but I didn't see any improvement for the second parse. I have seen something like this work in other situations, but I seem to be getting this wrong now.

parser.entry_rule();
parser.reset; lexer.reset();
parser.entry_rule();

Thanks, Ken

kaby76 commented 3 years ago

Also, I chose this grammar and examples because I noticed the large parse times. I think the grammar contains errors. I noticed a similar issue when extending the powerbuilder grammar with bad rules. K

mike-lischke commented 3 years ago

I'm afraid I cannot give you a cmake project, as I don't use cmake. Instead I created an Xcode project (terminal app, pure C++), however it should be easy for you to create an own project. It's just the main.cpp and the generated files, which you all put in a terminal project, link to the antlr4 c++ runtime and that's it.

About the warm-up differences: ANTLR4 internally uses a DFA structure with special "ATN configurations" (as they are called) to speed up lookahead. This structure is built on-the-fly while accessing a specific ATN path the first time. Next time this path is taken the parser can much faster do the lookup. This obviously makes the first run slower (sometimes much slower as we see here). So, in order to do good measurement you should parse a specific input twice. In the first round the DFA for this particular input is created and will take effect in all following rounds (I used 3 in my list above). If you parse different input you will again see the delay because of the creation of the new configurations for the changed ATN paths.

Because the DFA is shared, even between threads, it needs to be protected against race conditions. In fact, it's the only part in the runtime which is actually thread safe. However, that also costs time when manipulating the DFA. In C++ we additionally have the problem that we have to use shared pointers, which also are thread safe (and hence relatively slow). During DFA creation many of these are created and destroyed which makes the first run especially slow. This should explain why ANTLR4 runtimes have such a warm-up phase.

The calls to parser.reset() and lexer.reset() don't do what you expect btw. They only reset some states in these classes and do not touch the generated DFA. If you really want to fully return to the "cold" state then use ATNSimulator::clearDFA() instead which removes all the cached DFA states.

Here are the Source files for the C++ target. You will have to adjust file paths, as I used absolute ones for simplicity.

mike-lischke commented 3 years ago

@ftomassetti, this discussion might be interesting for you.

ericvergnaud commented 3 years ago

to measure performance after warmup, you just need to create a new lexer and and a new parser. the DFA populated during the first lexing/parsing phase survives until the program is exited, or when you explicitly get rid of it via clearDFA

Le 8 mars 2021 à 01:44, Mike Lischke notifications@github.com a écrit :

I'm afraid I cannot give you a cmake project, as I don't use cmake. Instead I created an Xcode project (terminal app, pure C++), however it should be easy for you to create an own project. It's just the main.cpp and the generated files, which you all put in a terminal project, link to the antlr4 c++ runtime and that's it.

About the warm-up differences: ANTLR4 internally uses a DFA structure with special "ATN configurations" (as they are called) to speed up lookahead. This structure is built on-the-fly while accessing a specific ATN path the first time. Next time this path is taken the parser can much faster do the lookup. This obviously makes the first run slower (sometimes much slower as we see here). So, in order to do good measurement you should parse a specific input twice. In the first round the DFA for this particular input is created and will take effect in all following rounds (I used 3 in my list above). If you parse different input you will again see the delay because of the creation of the new configurations for the changed ATN paths.

Because the DFA is shared, even between threads, it needs to be protected against race conditions. In fact, it's the only part in the runtime which is actually thread safe. However, that also costs time when manipulating the DFA. In C++ we additionally have the problem that we have to use shared pointers, which also are thread safe (and hence relatively slow). During DFA creation many of these are created and destroyed which makes the first run especially slow. This should explain why ANTLR4 runtimes have such a warm-up phase.

The calls to parser.reset() and lexer.reset() don't do what you expect btw. They only reset some states in these classes and do not touch the generated DFA. If you really want to fully return to the "cold" state then use ATNSimulator::clearDFA() instead which removes all the cached DFA states.

Here are the Source files for the C++ target https://github.com/antlr/antlr4/files/6097616/html.zip. You will have to adjust file paths, as I used absolute ones for simplicity.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3111#issuecomment-792321926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJFV6TEO7QOOBCSSP33TCO3ORANCNFSM4YWU6YEA.

kaby76 commented 3 years ago

Sorry for the delay... I had to clone a Win NTFS driver with partition resize, taking a day to do because most programs are just junk. And, I had to resurrect StringTemplates for C#--it hasn't been updated for many years, doesn't build, all wound up together with other things I don't want.

Here are some new results for three grammars (Java, Java9, Html) against three targets (Dart, CSharp, Java). I tried to run the grammars against JavaScript, but had a problem with the runtime. But, the results for JavaScript are similar to C# and Java, only slower.

Methods

perf.zip dotnet-antlr version 3.0.7. Note, the directory "templates" in the zip file contains the driver code.

Same computer as above: AMD Ryzen 2200G, 16G, B450M DS3H-CF, SATA-III SSD, Win10, Msys2

Results

Grammar/Input file	Dart Warm-up, mm:ss	Dart Post warm-up, mm:ss	CSharp Warm-up, mm:ss	CSharp Post warm-up, mm:ss	Java Warm-up, mm:ss	Java Post warm-up, mm:ss
Java9/AllInOne8.java	5.6	1.0	6.9	0.9	3.4	0.2
Java9/helloworld.java	0.6	0.04	0.56	0.03	0.4	0.04
Java9/HSDB.java	3:51.3	01:04.7	4:33	1:20	1:40	24.7
Java9/JavaParser.java	01:37.6	48.2	1:44	45.3	33.5	9.9
Java9/ManyStringsConcat.java	1.1	0.76	0.17	0.03	0.2	0.02
Java9/PKIXCertPathReviewer.java	02:44.6	00:11.2	3:02	13.3	1:11	2.5
---	---	---	---	---	---	---
Java/AllInOne8.java	0.7	0.3	0.18	0.02	0.15	0.03
Java/helloworld.java	0.18	0.01	0.08	0.0001	0.04	0.001
Java/HSDB.java	3.3	2.8	0.20	0.08	0.24	0.1
Java/JavaParser.java	16.0	15.3	0.48	0.33	0.7	0.7
Java/ManyStringsConcat.java	1.0	0.7	0.21	0.10	0.26	0.16
Java/PKIXCertPathReviewer.java	3.5	3.0	0.21	0.08	0.21	0.10
---	---	---	---	---	---	---
Html/abc.com.html	13.9	13.6	11.	11.	6.9	6.2
Html/aljazeera.com.html	1:07	1:07	48.	48.	27.	26.
Html/antlr.html	1.8	1.6	1.3	1.2	2.1	0.65
Html/attvalues.html	0.2	0.02	0.06	0.007	0.04	0.01
Html/bbc.html	1:23	1:23	1:03	1:03	35.	35.
Html/cnn1.html	2:24	2:24	2:10	2:10	1:08	1:09

Discussion

The results indicate that improvements in parse times with warm-up is grammar-specific. For the Html grammar, no warm-up improvements were observed. But, for the Java and Java9 grammars, the performance in later parses is improved significantly with warm-up. It might be worthwhile to investigate why the Html grammar is such a bad performer overall, and why warm-up does not improve the performance whatsoever.

In terms of which target is a better performer, the results indicate Java is the faster performer. But, it seems to depend on the grammarL Dart is generally faster than C# for Java9, but slower than C# for the other grammars.

--Ken

mike-lischke commented 3 years ago

Ken, many thanks for the update! Aside from very trivial input we can conclude that:

The C++ target is far ahead of any other target when it comes to speed (even though it can still be optimized), especially when it went over the cold phase. We are talking here about 10 µs vs 10 seconds orders of magnitutes.
C# is not even close to what I expected. I never did tests myself with it, but for some reasons I thought it would be much faster. @ftomassetti, you may want to change your optimization article where I claimed that C# would be among the fastest targets. I was plain wrong.
ANTLR4 is probably not the best choice for creating high performance production parsers, where a large amount of input must be parsed quickly (like in compilers). But performance was not the primary target for it, anyway.

gabriele-tomassetti commented 3 years ago

@mike-lischke the performance article has a short summary on performance of the different runtimes. The main objective of that section in the article was to show somebody that never thought about this particular aspect that you can actually get significant differences between different runtimes. Some people will read the article, or this discussion, and search for the best results. Somebody else might just want to check whether the expected performance of ANTLR is good enough.

We decided to avoid going into too much detail (although we link this discussion for people interested) because performance of specific programs can be complicated by personal expertise and needs.

For some people there is no difference between 0.1 seconds and 0.5 seconds because they just need a good enough level of performance. These discussions are quite interesting to us, but most people that just read the article or use ANTLR might not care for this level of precision.

I have also seen people get better performance than me from essentially the same program, because they knew the language better and they tweaked just the right configuration, or function, to get better results. So, maybe their ANTLR Python parser will have worse performance, but they can tweak the rest of their code to beat a C# program because they know Python very well, but not C#.

I think that, given that ANTLR developers are all quite competent, the end result is that the runtimes mostly adhere to the performance characteristics of each language. So I expect C++ to be quicker because ultimately the fundamental C++ building blocks are quicker.

kaby76 commented 3 years ago

@gabriele-tomassetti Your points are well taken. It's true that one can optimize the performance per target. I am not an expert in any target language or toolchain environment. In fact, I could barely get most targets working because I don't use all 6 languages and toolchains daily. In fact, despite all the resume cliche packing, I don't know who is. If they do, they're full of BS.

What I am more interested in looking at is the grammar itself. What behavior can I see by varying the input while holding the rest of the variables constant. What does that tell me about the grammar? Why is it that the Html grammar does not have better performance using warm-up? Why is the ratio of timings from one runtime to another not a constant over inputs? Is it a grammar-related issue? Can one devise an oracle to give advice on optimizing the grammar?

--Ken

gabriele-tomassetti commented 3 years ago

@kaby76 Thanks for your work on this topic. Your research is interesting and useful. I think it could help figure out whether there is a margin of improvement on a specific runtime or specific issues to be aware. For instance, looking at your data there seems to be some cases in which the performance of the Go runtime worsen significantly. An expert in Go might explain us why.

I am also quite interested in the same questions you pose, because personally I am not an expert in performance for a specific language, so it is easier for me thinking in terms of how to optimize a grammar. Which is essentially the main point I was making before, for many people improving performance means focusing on the things they know about. If I were an expert in Python I might improve my Python code, but since I am not, I might just switch to the C# runtime.

I find particularly insightful this question:

Can one devise an oracle to give advice on optimizing the grammar?

I think that figuring this one out would be of great help.

kaby76 commented 3 years ago

And, not only if an oracle exists (which I do think), but to have a tool that can automatically optimize the grammar for you.

kaby76 commented 3 years ago

@mike-lischke It took a little while, but I duplicated your runtime results--Cpp is several orders of magnitude faster than the next fastest target. I now have templates for Cpp for possible use in testing grammars-v4. It uses CMake for building, but unfortunately builds its own private copy of the Antlr runtime. So, it's not practical yet.

mike-lischke commented 3 years ago

Great to see you could repeat the results and may have another language for grammar testing at hand!

canastro commented 3 years ago

@kaby76 I was having performance issues with Dart runtime (in a flutter app) and turning the logs off made a huge impact, so you might want to try to do this:

Hopefully https://github.com/antlr/antlr4/pull/3128 will be merged soon, making the logs off by default and allowing the user to override that via build declarations.

kaby76 commented 3 years ago

@canastro My tests are straight out-of-the-box packages on the standard servers. So, whatever has been published to the public package libraries, I use. I haven't seen much of a performance hit for Dart even with logging. The runtimes you can see above. Likely I have to do something else to turn on the logging. But, having a non-debug, non-logging version of the Dart runtime I would think should be near the top of the "todo" list for the next release of Antlr.

I'm still looking to find differences in how the parsers work across targets, extending the templates for parser drivers towards all the targets--7 now working very well, 8 if you count Harwell's old Antlr4cs which I present as another target--and cleaning up the grammars-v4 grammars to be cross-target and cross-platform testable. For testing, I do filter out a number of grammars and parsing examples using regular expressions on grammar and example file names in order to bypass problems. For some, I do not test anything (e.g., ^(?!.*)$).

On the Cpp target, I found some errors in my scripts. I corrected them and ran them for Debug and Release, CNU C++ and VS2019 C++, on Linux (WSL2) and Windows, and now see the runtimes more or less similar to the other targets. In fact, I'm seeing much slower runtimes for some tests for Cpp than any other target even with Release mode. Everything is built using CMake to generate the build files. I will continue to review and duplicate Mikes's code to see what I'm doing differently.

--Ken

antlr / antlr4

Performance across targets #3111

Update

Methods

Results

Discussion