Closed oharboe closed 10 months ago
@louiic FYI
@louiic @maliberty This report contains all sorts of details about what is inside the megaboom design...
It can easily be seen that there are 42 srams of 1024 bits or more. There are 243 inferred SRAMs in total. In synthesis there are a number of SRAMs that are inferred, but can't be implemented with hard macros, because they use asynchronous read or breaks some other behavioral limitation of the FPGA SRAMs.
There are ca. 25 multipliers of 27x27 bits.
I think for this design to have any chance with Yosys and ORFS, then ORFS needs to create macros on the fly for whatever yosys can infer in terms of multipliers, SRAMs, etc. and then leave to the user to refine the creation of these macros. This would allow the user to run the design through the entire flow and attack it peacemal.
It is still a LOT of work, but attacking megaboom as a big black box with long turnaround times can only end in tears.
Another thing that can be easily seen in the report: A synchronous reset with a high fanout can never give a shot clock period.
On an FPGA, the way a synchronous reset with a high fanout is handled is to add a lot of pipelining stages, until fan-out at the leaf nodes of this tree is tolerable.
Looking at the longest setup path:
We can discuss when it is closed and reopen if we want to take action on it...
I believe the Chisel compiler actually produces a list of RAMs as an artifact when compiling, at least the last time I used it a few years ago. It shouldn't be that difficult to look at the list and have the user implement the RAMs.
One of the major differences between FPGA and ASIC is that FPGA only has 1 or 2 types of RAMs it can map memories onto. ASICs can have multiple types (RF, SRAM, ROM, eDRAM...) of arbitrary size and defined on a process-by-process basis. It's easy to choose the right RAM on FPGA, it's hard to do it for ASIC. There are actually some HLS tools that do it though, based on a library of macros which the user provides.
The major limiter to doing this in Yosys will still be connecting it to a plugin which can generate the RAM, such as DFFRAM or OpenRAM, and then also meeting the requirements for those tools.
@rovinski I think you are onto something, this has to be sorted out on the Chisel/RTL side, break macros into modules, and not rely on yosys inference.
What I have in mind w.r.t. ORFS, is that a list of modules to be made into macros can be provided ORFS and that ORFS can lay out these macros automatically.
This would provide a first implementation/flow that can serve as a starting point for users to refine. The RTL would be behavioral logic, so replacing that with some sore of SRAM, multiplier, etc. is more PDK and ORFS specific.
Still running, detailed route took 3h30 on the 0th iterartion. Will update the below unless I have to stop the build.
$ make DESIGN_CONFIG=designs/asap7/megaboom/config.mk elapsed
[INFO][FLOW] Using platform directory ./platforms/asap7
[INFO-FLOW] ASU ASAP7 - version 2
Default PVT selection: BC
Log Elapsed seconds
2_1_floorplan 629
2_2_floorplan_io 8
2_4_floorplan_macro 10
2_5_floorplan_tapcell 153
2_6_floorplan_pdn 205
3_1_place_gp_skip_io 666
3_2_place_iop 18
3_3_place_gp 7090
3_4_place_resized 1476
3_5_place_dp 1624
4_1_cts 14230
4_2_cts_fillcell 60
5_1_grt 4973
5_2_route 43793
6_1_merge 285
6_report 7278
Description
I wanted to write down some information that I got when running megaboom through Quartus and I'm filing this as a feature request. We can mark this issue as "help wanted" or perhaps close it, the main thing is that I wanted somewhere to file this info for discussion.
I ran the megaboom design through Quartus to get some information about possible lingering srams. Quartus takes 5 minutes to synthesize megaboom, whereas Yosys takes many hours.
If ORFS could be changed to detect all srams and create macros for them, implementing the innards of these srams with flip-flops, leaving refinement to the author, then it would be easier to run designs such as megaboom through the ORFS flow.
Most SRAMs detected by Quartus are tiny SRAMs, much smaller than 1024 bits, thus probably equally well implemented in flipflops.
However, some are large enough to be considered for SRAM implementation:
This one is on the small side:
This one is implemented as an SRAM in the megaboom design:
Suggested Solution
We can see from synthesis resource usage that megaboom isn't a very big design. The biggest FPGAs can fit roughly 8-10x this in terms of ALMs:
Additional Context
No response