Surelog parsing seems pretty slow

chipsalliance / Surelog

SystemVerilog 2017 Pre-processor, Parser, Elaborator, UHDM Compiler. Provides IEEE Design/TB C/C++ VPI and Python AST & UHDM APIs. Compiles on Linux gcc, Windows msys2-gcc & msvc, OsX

Apache License 2.0

353 stars 67 forks source link

Surelog parsing seems pretty slow #1727

Closed pieter3d closed 1 year ago

pieter3d commented 3 years ago

Example, parsing the Swerv-EH1 core on an i5-9600K 3.7GHz takes about 18 seconds. I tried generating the valgrind callgrind data, but I gave up waiting after a few minutes. Profiling a smaller design looks like below in kcachegrind. Seems like it's mostly all in the antlr parser. Anything that can be done here?

alaindargelas commented 3 years ago

Changing all the dynamic casting to simpler/less expensive type checking. The fork of antlr I made does some of it. It would be good to finish the removal of all the dynamic casts. There is another thread on removing all rtti support to save code memory and runtime. The first compile is slow but everything is incremental after that, all files are cached. You can so try parallel compile with -mt or -mp options

hzeller commented 3 years ago

Yes, ANTLR performance is a known painpoint and we have not spent time yet to imrpove it; also see #273 and #117

Antlr churns through a lot of small object memory, uses std::shared_ptr and does dynamic cast a lot, as the C++ library is unfortunately modelled after the Java runtime. Ideally, we just rewrite the Antlr runtime :)

Short of that, pinpointing where a lot of time is spent to then specifically targeting that in the first step is probably good (I looked at one aspect in #604 for instance). I mostly used pprof and linux perf for profiling so far.

It is a known issue but currently nobody is working on it, so any contribution would be welcome.

hzeller commented 3 years ago

Avoiding shared ptr, dynamic cast and string-copy (by using string_view as much as possible, backed by the original buffer) will probably bring the most bang for the buck. Avoiding RTTI is probably the easiest by employing similar tricks @alaindargelas already did in his branch for some of the types; changing the code-generation to generate code that avoids it in the other places might be doable (will also help #1718)

hs-apotell commented 3 years ago

Unless someone else is volunteering for it, I am taking charge of swapping the default RTTI for something custom and "hopefully" faster. I have already started the work and should have something to showcase in next day or two. Tracking #1718

hs-apotell commented 2 years ago

@pieter3d Could you please revisit the performance metrics and compare them against your previous ones. Thanks!

pieter3d commented 2 years ago

Tried parsing a design with the latest vs from 5 days ago: 19.4s -> 18.5s (about 1 second faster).

alaindargelas commented 2 years ago

@hs-apotell I'm sure @pieter3d is looking for something more drastic like 2x to 5x faster. Let's keep this open.

pieter3d commented 2 years ago

For reference, I was parsing and elaborating the swerv-eh1 core.

alaindargelas commented 2 years ago

Let's keep this issue and closing #273

siliconvoodoo commented 2 years ago

More data point on the improvements?

hzeller commented 2 years ago

The biggest improvements can certainly be achieved by reducing ambiguities in the grammar (one example e.g. sketched in https://github.com/chipsalliance/Surelog/pull/2756 by @QuantamHD ); this is as ambiguities results in exponential parsing costs as there might be a lot of back-tracking needed. Right now, the grammar is written so that it is easy to understand and close to the spec, but that is of course not necessarily optimal and there is a lot of room for improvement reducing ambiguities and look-ahead for someone who has some experience in Antlr grammars. Depending on the particular code to be parsed, this can probably improve some parsing examples by 10x-100x. Someone with Antlr expertise, please help out :)

There is also some smaller room in improving the runtime. The current development branch of Antlr does have some expected improvements that will reduce parsing time (reducing RTTI, less impact of shared_ptr, improved initialization time...), but unlike the potential in grammar improvement, we're here looking at more like a moderate 1.5x-2x speedup.

Thomasb81 commented 1 year ago

The testcase and the procedure to reproduce this testcase is not very clear to me. ( source code testcase, command line) I notice an huge improvement for testcase of following issue https://github.com/chipsalliance/Surelog/issues/2035

I suggest to give a re-try with current project head.

pieter3d commented 1 year ago

Just tried the same source code (swerv eh1) on the same machine as before. Takes about 9 seconds now, so ~2x improvement. Pretty nice!

Thomasb81 commented 1 year ago

Issue https://github.com/chipsalliance/Surelog/issues/2035 is also closed. I assume surelog code change is drop. @hzeller mentions in https://github.com/chipsalliance/Surelog/issues/1727#issuecomment-1091985197 that target speed-up is 1.5x-2x.

I understand no-one is currently working on the subject, performance has improve over the time mainly due to antlr c++ runtime improvement.

Could we conclude this target is now achieve and close this issue ?

pieter3d commented 1 year ago

It's still pretty slow compared to something like Verdi, but I guess for now closing this won't take away the general goal of improving performance.

Thomasb81 commented 1 year ago

No, but at least the performance issue on your specific use-case, is some how address ?

On parsing problem, more speed is always desirable but at some point the cost effort become fancy. You are comparing a tool using a parser generator, so targeting general purpose parsing problem again a commercial tool that probably use super optimized / handcrafted and very specific parser.

Commercial grade performance has a cost. If you consider replacing antlr in surelog, better to look for another solution to fill the UDHM db...

pieter3d commented 1 year ago

Yeah I think for now it's fine to close this issue like I said. I can always reopen it, or file a new one if there is a very specific case with an accompanying profile