The-OpenROAD-Project / OpenROAD-flow-scripts

OpenROAD's scripts implementing an RTL-to-GDS Flow. Documentation at https://openroad-flow-scripts.readthedocs.io/en/latest/
https://theopenroadproject.org/
Other
286 stars 264 forks source link

reduce complexity of megaboom design #1400

Closed oharboe closed 10 months ago

oharboe commented 10 months ago

Description

I wanted to write down some information that I got when running megaboom through Quartus and I'm filing this as a feature request. We can mark this issue as "help wanted" or perhaps close it, the main thing is that I wanted somewhere to file this info for discussion.

I ran the megaboom design through Quartus to get some information about possible lingering srams. Quartus takes 5 minutes to synthesize megaboom, whereas Yosys takes many hours.

If ORFS could be changed to detect all srams and create macros for them, implementing the innards of these srams with flip-flops, leaving refinement to the author, then it would be easier to run designs such as megaboom through the ORFS flow.

Most SRAMs detected by Quartus are tiny SRAMs, much smaller than 1024 bits, thus probably equally well implemented in flipflops.

However, some are large enough to be considered for SRAM implementation:

image

This one is on the small side:

image

This one is implemented as an SRAM in the megaboom design:

image

Suggested Solution

We can see from synthesis resource usage that megaboom isn't a very big design. The biggest FPGAs can fit roughly 8-10x this in terms of ALMs:

image

Additional Context

No response

oharboe commented 10 months ago

@louiic FYI

oharboe commented 10 months ago

@louiic @maliberty This report contains all sorts of details about what is inside the megaboom design...

It can easily be seen that there are 42 srams of 1024 bits or more. There are 243 inferred SRAMs in total. In synthesis there are a number of SRAMs that are inferred, but can't be implemented with hard macros, because they use asynchronous read or breaks some other behavioral limitation of the FPGA SRAMs.

There are ca. 25 multipliers of 27x27 bits.

place.zip

I think for this design to have any chance with Yosys and ORFS, then ORFS needs to create macros on the fly for whatever yosys can infer in terms of multipliers, SRAMs, etc. and then leave to the user to refine the creation of these macros. This would allow the user to run the design through the entire flow and attack it peacemal.

It is still a LOT of work, but attacking megaboom as a big black box with long turnaround times can only end in tears.

oharboe commented 10 months ago

Another thing that can be easily seen in the report: A synchronous reset with a high fanout can never give a shot clock period.

On an FPGA, the way a synchronous reset with a high fanout is handled is to add a lot of pipelining stages, until fan-out at the leaf nodes of this tree is tolerable.

image

oharboe commented 10 months ago

Looking at the longest setup path:

image

oharboe commented 10 months ago

We can discuss when it is closed and reopen if we want to take action on it...

rovinski commented 10 months ago

I believe the Chisel compiler actually produces a list of RAMs as an artifact when compiling, at least the last time I used it a few years ago. It shouldn't be that difficult to look at the list and have the user implement the RAMs.

One of the major differences between FPGA and ASIC is that FPGA only has 1 or 2 types of RAMs it can map memories onto. ASICs can have multiple types (RF, SRAM, ROM, eDRAM...) of arbitrary size and defined on a process-by-process basis. It's easy to choose the right RAM on FPGA, it's hard to do it for ASIC. There are actually some HLS tools that do it though, based on a library of macros which the user provides.

The major limiter to doing this in Yosys will still be connecting it to a plugin which can generate the RAM, such as DFFRAM or OpenRAM, and then also meeting the requirements for those tools.

oharboe commented 10 months ago

@rovinski I think you are onto something, this has to be sorted out on the Chisel/RTL side, break macros into modules, and not rely on yosys inference.

What I have in mind w.r.t. ORFS, is that a list of modules to be made into macros can be provided ORFS and that ORFS can lay out these macros automatically.

This would provide a first implementation/flow that can serve as a starting point for users to refine. The RTL would be behavioral logic, so replacing that with some sore of SRAM, multiplier, etc. is more PDK and ORFS specific.

oharboe commented 10 months ago

Still running, detailed route took 3h30 on the 0th iterartion. Will update the below unless I have to stop the build.

$ make DESIGN_CONFIG=designs/asap7/megaboom/config.mk elapsed
[INFO][FLOW] Using platform directory ./platforms/asap7
[INFO-FLOW] ASU ASAP7 - version 2
Default PVT selection: BC
Log                       Elapsed seconds
2_1_floorplan                    629
2_2_floorplan_io                   8
2_4_floorplan_macro               10
2_5_floorplan_tapcell            153
2_6_floorplan_pdn                205
3_1_place_gp_skip_io             666
3_2_place_iop                     18
3_3_place_gp                    7090
3_4_place_resized               1476
3_5_place_dp                    1624
4_1_cts                        14230
4_2_cts_fillcell                  60
5_1_grt                         4973
5_2_route                      43793
6_1_merge                        285
6_report                        7278