The-OpenROAD-Project / OpenLane

OpenLane is an automated RTL to GDSII flow based on several components including OpenROAD, Yosys, Magic, Netgen and custom methodology scripts for design exploration and optimization.
https://openlane.readthedocs.io/
Apache License 2.0
1.25k stars 365 forks source link

Unbalanced buffer insertion on high fanout designs #2090

Open Dolu1990 opened 4 months ago

Dolu1990 commented 4 months ago

Description

Hi,

My setup: I had a design with a high fanout part, where i had to read a few register based memory array (~2530 muxes to drive from one address).

Symptoms : In such case, it seems that the buffer insertion done by the flow is very unbalanced, because that critical path was using a chain of 13 buffers (typical can out of 10), while an utopian balanced fanout would be able to reach 10^13 gates.

Here is an image to ilustrate, where in pink you can see the buffer chain for the high fanout net : image

Here is for reference the critical path :

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock clk (rise edge)
   0.00    0.00   clock network delay (ideal)
   0.00    0.00 ^ EU0_ExecutionUnitBase_pipeline_execute_0_Frontend_MICRO_OP[8]_sky130_fd_sc_hd__mux2_2_A0_A1_sky130_fd_sc_hd__a221o_2_A1_X_sky130_fd_sc_hd__o21a_2_B1_X_sky130_fd_sc_hd__dfxtp_2_D/CLK (sky130_fd_sc_hd__dfxtp_4)
   0.46    0.46 v EU0_ExecutionUnitBase_pipeline_execute_0_Frontend_MICRO_OP[8]_sky130_fd_sc_hd__mux2_2_A0_A1_sky130_fd_sc_hd__a221o_2_A1_X_sky130_fd_sc_hd__o21a_2_B1_X_sky130_fd_sc_hd__dfxtp_2_D/Q (sky130_fd_sc_hd__dfxtp_4)
   0.28    0.75 v EU0_ExecutionUnitBase_pipeline_execute_0_SrcStageables_SRC2[1]_sky130_fd_sc_hd__or2_2_B/X (sky130_fd_sc_hd__or2_1)
   0.24    0.98 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[1]_sky130_fd_sc_hd__and4_2_A/X (sky130_fd_sc_hd__and4_1)
   0.21    1.19 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[1]_sky130_fd_sc_hd__a31oi_2_B1_Y_sky130_fd_sc_hd__o21bai_2_A1/Y (sky130_fd_sc_hd__o21bai_4)
   0.26    1.45 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[2]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.23    1.69 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[3]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.25    1.94 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[4]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.29    2.23 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[5]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_2)
   0.25    2.48 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[6]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.29    2.77 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[7]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21oi_2_B1/Y (sky130_fd_sc_hd__a21oi_4)
   0.22    2.99 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[8]_sky130_fd_sc_hd__nand2_2_A_Y_sky130_fd_sc_hd__o211a_2_C1/X (sky130_fd_sc_hd__o211a_1)
   0.24    3.23 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[10]_sky130_fd_sc_hd__nand2_2_A_Y_sky130_fd_sc_hd__o31a_2_B1/X (sky130_fd_sc_hd__o31a_2)
   0.20    3.43 ^ Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1_A2_sky130_fd_sc_hd__and3_2_X_B_sky130_fd_sc_hd__a211o_2_X/X (sky130_fd_sc_hd__a211o_1)
   0.15    3.59 v Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1_A3_sky130_fd_sc_hd__a21oi_2_Y/Y (sky130_fd_sc_hd__a21oi_2)
   0.29    3.88 v wire486/X (sky130_fd_sc_hd__buf_4)
   0.43    4.31 v Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1/X (sky130_fd_sc_hd__o31a_2)
   0.20    4.51 v wire479/X (sky130_fd_sc_hd__buf_12)
   0.44    4.95 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[8][4]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__inv_2_Y/Y (sky130_fd_sc_hd__inv_2)
######### chain start here
   0.30    5.24 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[8][4]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_4)
   0.32    5.57 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[0][5]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A/X (sky130_fd_sc_hd__buf_4)
   0.31    5.87 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[4][7]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__clkbuf_8)
   0.32    6.19 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[0][9]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_6)
   0.36    6.55 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[4][9]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_8)
   0.37    6.92 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][9]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__clkbuf_8)
   0.31    7.23 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][9]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_2_A/X (sky130_fd_sc_hd__clkbuf_8)
   0.35    7.58 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][16]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__buf_6)
   0.33    7.91 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][16]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_A/X (sky130_fd_sc_hd__buf_4)
   0.32    8.23 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[4][17]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_2_A/X (sky130_fd_sc_hd__buf_12)
   0.45    8.68 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_3[29][18]_sky130_fd_sc_hd__or2_2_A_B_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__clkbuf_16)
   0.39    9.07 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_3[29][18]_sky130_fd_sc_hd__or2_2_A_B_sky130_fd_sc_hd__buf_1_A_4/X (sky130_fd_sc_hd__buf_12)
   0.44    9.51 ^ Lsu2Plugin_logic_sharedPip_stages_0_ADDRESS_PRE_TRANSLATION[13]_sky130_fd_sc_hd__buf_2_X/X (sky130_fd_sc_hd__buf_12)
######### chain end here
   0.66   10.17 v Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][19]_sky130_fd_sc_hd__mux4_2_A0/X (sky130_fd_sc_hd__mux4_2)
   0.42   10.59 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__a31o_2_X_A2_sky130_fd_sc_hd__a211o_2_X/X (sky130_fd_sc_hd__a211o_1)
   0.21   10.80 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__a31o_2_X/X (sky130_fd_sc_hd__a31o_1)
   0.00   10.80 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__dfxtp_2_D/D (sky130_fd_sc_hd__dfxtp_1)
          10.80   data arrival time

So, i don't know if you had similar issues / case ? ex : clock enable / reset tree / ...

Proposal

No response

Dolu1990 commented 4 months ago

I got another case, with a very long chain of buffer on another part of the design (16 buffer), but this time the average fanout is quite different. Bellow, in pink is the buffer chain path. image

I would say there is no good reason for so much buffers, but i don't know much. Let's me know if you have an idea :)

mo-hosni commented 4 months ago

Hi, In the second example, most of the buffers are either for long wires or max cap. Typically, the fanout buffers are names start with fanout. Also, in the first example, all the buffer names are different. Are you sure these buffers are inserted by OpenROAD in the Design Optmizations? Also, can you send a reproducible or the configuration and timing constraints used? Thanks

Dolu1990 commented 4 months ago

Hi,

In the second example, most of the buffers are either for long wires or max cap. Typically, the fanout buffers are names start with fanout

I lost the exact setup i used in the screen shot, sorry, but i still had the outflow. I toke a video where i show a bit the paths. https://drive.google.com/file/d/1WWhCPqjMZksxn_hHWuh5DWztjdk9ewGj/view?usp=drive_link Seems to me that the first part of the buffer chain is achieving very little (not traveling far nor driving much, especialy compared to other paths in the design)

Also, in the first example, all the buffer names are different. Are you sure these buffers are inserted by OpenROAD in the Design Optmizations?

In the verilog i feed openlane with, there is no handwritten buffer insertions, so those buffer come from somewere in the whole openlane flow, i don't know more.

Also, can you send a reproducible or the configuration and timing constraints used?

Here is a design which can be used to recreate similar issues to the first case : design_nax.zip

And a video of the case 1: case1.mkv.zip Also, one thing to notice in that case 1, i digged a bit more to see what was connected to the buffers, and basicaly in that loooong chain of buffers, for each layer it is generaly : "~8 gates + ~2 buffers". So each layer scale the path very little.

For case 2, I do not have the original verilog file, but here is the synthethised netlist (in case of, it may just be fine to swap it in the design_nax.zip) nax.v.zip

Thanks :)