JulianKemmerer / PipelineC

A C-like hardware description language (HDL) adding high level synthesis(HLS)-like automatic pipelining as a language construct/compiler feature.
https://github.com/JulianKemmerer/PipelineC/wiki
GNU General Public License v3.0
607 stars 50 forks source link

Make path delay runs arbitrarily parallelizable #74

Closed JulianKemmerer closed 1 month ago

JulianKemmerer commented 2 years ago

The first half of tool execution is getting path delays for modules.

This should be able to occur in any order / parallel.

Currently however, the tool traverses the module hierarchy up the tree synthesizing bigger and bigger modules as it goes.

This behaves very poorly if there are lots of modules near the top level of the design (ex. if top level has a few-sub wrapper levels, you are essentially sequentially synthesizing the same module over and over...)

Whos got time to wait around - not @suarezvictor :-p

suarezvictor commented 2 years ago

Considering that the aim of the tool is "faster development", it seems a bit important

JulianKemmerer commented 1 year ago

Also important to note that the second half of tool runtime, iterating on pipelines could also be parallelized...

^good next step after this easier issue is resolved

JulianKemmerer commented 1 month ago

Fix in https://github.com/JulianKemmerer/PipelineC/commit/ba7ffc2da4549ba7a8b750a73b05365ec64d2b6a

suarezvictor commented 1 month ago

Hi Julian, were you able to measure some gain factors with any specific example?

JulianKemmerer commented 1 month ago

If your computer has a bunch of cores you can set number of processes to use to higher https://github.com/JulianKemmerer/PipelineC/blob/master/config/num_processes.cfg and now that this issue has been fixed, you will see those threads actual used better (no longer waiting to do runs in a specific hierarchy order)

If your design has alot of not-pipelined logic then you will also see savings from that logic not being synthesized (since not pipelined). This was a big savings for me working on riscv cpu - only parts of it is pipelined, so instead of hundreds of modules to run synthesis for, it was reduces to the handful that will be pipelined (hour/s of runtime saved)

:+1: