Closed pieter3d closed 1 year ago
Changing all the dynamic casting to simpler/less expensive type checking. The fork of antlr I made does some of it. It would be good to finish the removal of all the dynamic casts. There is another thread on removing all rtti support to save code memory and runtime. The first compile is slow but everything is incremental after that, all files are cached. You can so try parallel compile with -mt or -mp options
Yes, ANTLR performance is a known painpoint and we have not spent time yet to imrpove it; also see #273 and #117
Antlr churns through a lot of small object memory, uses std::shared_ptr
and does dynamic cast a lot, as the C++ library is unfortunately modelled after the Java runtime. Ideally, we just rewrite the Antlr runtime :)
Short of that, pinpointing where a lot of time is spent to then specifically targeting that in the first step is probably good (I looked at one aspect in #604 for instance). I mostly used pprof
and linux perf for profiling so far.
It is a known issue but currently nobody is working on it, so any contribution would be welcome.
Avoiding shared ptr, dynamic cast and string-copy (by using string_view as much as possible, backed by the original buffer) will probably bring the most bang for the buck. Avoiding RTTI is probably the easiest by employing similar tricks @alaindargelas already did in his branch for some of the types; changing the code-generation to generate code that avoids it in the other places might be doable (will also help #1718)
Unless someone else is volunteering for it, I am taking charge of swapping the default RTTI for something custom and "hopefully" faster. I have already started the work and should have something to showcase in next day or two. Tracking #1718
@pieter3d Could you please revisit the performance metrics and compare them against your previous ones. Thanks!
Tried parsing a design with the latest vs from 5 days ago: 19.4s -> 18.5s (about 1 second faster).
@hs-apotell I'm sure @pieter3d is looking for something more drastic like 2x to 5x faster. Let's keep this open.
For reference, I was parsing and elaborating the swerv-eh1 core.
Let's keep this issue and closing #273
More data point on the improvements?
The biggest improvements can certainly be achieved by reducing ambiguities in the grammar (one example e.g. sketched in https://github.com/chipsalliance/Surelog/pull/2756 by @QuantamHD ); this is as ambiguities results in exponential parsing costs as there might be a lot of back-tracking needed. Right now, the grammar is written so that it is easy to understand and close to the spec, but that is of course not necessarily optimal and there is a lot of room for improvement reducing ambiguities and look-ahead for someone who has some experience in Antlr grammars. Depending on the particular code to be parsed, this can probably improve some parsing examples by 10x-100x. Someone with Antlr expertise, please help out :)
There is also some smaller room in improving the runtime. The current development branch of Antlr does have some expected improvements that will reduce parsing time (reducing RTTI, less impact of shared_ptr, improved initialization time...), but unlike the potential in grammar improvement, we're here looking at more like a moderate 1.5x-2x speedup.
The testcase and the procedure to reproduce this testcase is not very clear to me. ( source code testcase, command line) I notice an huge improvement for testcase of following issue https://github.com/chipsalliance/Surelog/issues/2035
I suggest to give a re-try with current project head.
Just tried the same source code (swerv eh1) on the same machine as before. Takes about 9 seconds now, so ~2x improvement. Pretty nice!
Issue https://github.com/chipsalliance/Surelog/issues/2035 is also closed. I assume surelog code change is drop. @hzeller mentions in https://github.com/chipsalliance/Surelog/issues/1727#issuecomment-1091985197 that target speed-up is 1.5x-2x.
I understand no-one is currently working on the subject, performance has improve over the time mainly due to antlr c++ runtime improvement.
Could we conclude this target is now achieve and close this issue ?
It's still pretty slow compared to something like Verdi, but I guess for now closing this won't take away the general goal of improving performance.
No, but at least the performance issue on your specific use-case, is some how address ?
On parsing problem, more speed is always desirable but at some point the cost effort become fancy. You are comparing a tool using a parser generator, so targeting general purpose parsing problem again a commercial tool that probably use super optimized / handcrafted and very specific parser.
Commercial grade performance has a cost. If you consider replacing antlr in surelog, better to look for another solution to fill the UDHM db...
Yeah I think for now it's fine to close this issue like I said. I can always reopen it, or file a new one if there is a very specific case with an accompanying profile
Example, parsing the Swerv-EH1 core on an i5-9600K 3.7GHz takes about 18 seconds. I tried generating the valgrind callgrind data, but I gave up waiting after a few minutes. Profiling a smaller design looks like below in kcachegrind. Seems like it's mostly all in the antlr parser. Anything that can be done here?