chipsalliance / Surelog

SystemVerilog 2017 Pre-processor, Parser, Elaborator, UHDM Compiler. Provides IEEE Design/TB C/C++ VPI and Python AST & UHDM APIs. Compiles on Linux gcc, Windows msys2-gcc & msvc, OsX
Apache License 2.0
368 stars 69 forks source link

Performance: Example of a particularly slow parsing file. Does Antlr recursively explores all possible trees ? #646

Closed hzeller closed 4 years ago

hzeller commented 4 years ago

Example files

To reproduce:

wget -O/tmp/comp1001.v https://raw.githubusercontent.com/steveicarus/ivtest/master/ivltests/comp1001.v
/usr/bin/time -v build/bin/surelog -nopython -nobuiltin -parse -noelab /tmp/comp1001.v
<...>
        Command being timed: "build/bin/surelog -nopython -nobuiltin -parse -noelab /tmp/comp1001.v"
        User time (seconds): 254.77
        System time (seconds): 2.27
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:17.26
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 8171332
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2010332
        Voluntary context switches: 1
        Involuntary context switches: 23904
        Swaps: 0
        File system inputs: 0
        File system outputs: 8048
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

This takes > 4 minutes to parse (on my machine) and about 8GiB of RAM. It does look like Antlr goes into some tangent here...

Maybe there is a simple grammar tweak that help untie the knot it is getting itself into ?

(In comparison that this is not a particularly hard-to-parse file, verible takes about 25ms, Yosys about 200ms to parse this file, including invocation of the binary)

alaindargelas commented 4 years ago

One or more of these expressions are taking the bulk of the time. Can you try to generate a small testcase per line of expression (or binary search) to narrow it down to the offender? That would help to run the particular expression through the Antlr Grammar Java Profiler.

Example:

module top() assign r35 = (r155 ((r190 > (((((((r127 % ( | ( r81))) & r150) | r11) !== (((r207 <= (22'h4d2b ? ((r158 == ( - ( r219))) | 13'hdae) : r43)) != ((r97 > ((r240 != ((26'h259a 13'h465) >= (5'h2 * 11'h7ed))) % (((12'h279 < 28'h3026) || 4'h1) && r111))) & ( ! ( ( ! ( (16'h7a99 <= 4'h6))))))) & 26'h5fa4)) && (((r15 | (( ( | ( ( ~ ( (r235 && 31'h2eeb))))) / r90) !== r56)) <= ((22'h640f !== r182) <= (31'h1b37 !== ( ( | ( (((9'hc9 / 32'h7c3b) - (8'ha6 - 3'h2)) == ((12'h36b - 9'h171) < ( + ( 23'h425)))))) % r72)))) + $time)) > r1) !== 18'h6d7b)) >= 28'h6e25)); endmodule

hzeller commented 4 years ago

sounds good, will try to simplify

alaindargelas commented 4 years ago

I think we need to enroll the "Antlr guy" to see what is wrong with the expression/primary rules.

On Thu, Jul 16, 2020 at 3:45 PM Henner Zeller notifications@github.com wrote:

sounds good, will try to simplify

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/alainmarcel/Surelog/issues/646#issuecomment-659716295, or unsubscribe https://github.com/notifications/unsubscribe-auth/APFYJ5FZEPYUEJTQBFPY4BLR357J5ANCNFSM4O44AMOA .

Thomasb81 commented 4 years ago

Have you look how https://github.com/Nic30/hdlConvertor runtime behave on the pointed example ?

alaindargelas commented 4 years ago

No, can you run it?I see the grammar for expression is very similar, verbatim front the standard. I'd expect a similar runtime?

Sent from Yahoo Mail on Android

On Fri, Jul 17, 2020 at 0:09, Thomasb81notifications@github.com wrote:

Have you look how https://github.com/Nic30/hdlConvertor runtime behave on the pointed example ?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.

alaindargelas commented 4 years ago

I think I cut the time in half and the memory requirement in half too by making parenthesis be dealt with the expression rule itself, without having to go through the primary rule. I'll work on this some more. I'll remove the rules for operator tokens.

alaindargelas commented 4 years ago

See https://github.com/alainmarcel/Surelog/pull/651

alaindargelas commented 4 years ago

That is all I can do in the grammar. Rest of the opimization has to come from the generated code or runtime library.