RUDY and global route map differ for testcase with centered die area

stefanottili commented 5 months ago

Describe the bug

More fallout from https://github.com/The-OpenROAD-Project/OpenROAD/issues/5284

The RUDY map looks different then the global_route map.

And then there are ERROR's from detailed placement and manually started global_route ...

[INFO DPL-0034] Detailed placement failed on the following 62 instances:
[INFO DPL-0035]  lx1/lx0/IRAM/MUX_INSTW_UP/U27
[INFO DPL-0035]  lx1/lx0/IRAM/MUX_INSTW_UP/U28
...
[INFO DPL-0035]  lx1/lx0/IRAM/U40
[INFO DPL-0035]  lx1/lx0/IRAM/U41
[ERROR DPL-0036] Detailed placement failed.

 global_route
[ERROR GRT-0118] Routing congestion too high. Check the congestion heatmap in the GUI.
[ERROR GUI-0070] GRT-0118

I'm assuming that this testcase was P&R'able in 2004, when init floorpan would always center the diearea.

Expected Behavior

matching rudy and global route map

Environment

OpenROAD v2.0-14335-g0a6aa3b92 (Jun 27 2024)
Features included (+) or not (-): +Charts +GPU +GUI +Python

To Reproduce

read_lef lef/risc2.lef.gz
read_def def/risc2.def.gz
#rtl_macro_placer
macro_placement
global_placement -density 0.95
detailed_placement
global_route

Relevant log output

No response

Screenshots

No response

Additional Context

No response

maliberty commented 5 months ago

There is no call to init_floorplan so why would it have worked earlier?

stefanottili commented 5 months ago

I was referring to the sedsm commands used to run this iccad04 testcase back in the days.

maliberty commented 5 months ago

This design has 90+% placement density so it seems unlikely it was ever routable.

I don't know what sedsm commands are but I don't think we've ever had this in our regression

stefanottili commented 5 months ago

sedsm stands for "silicon ensemble deep sub micron", the name of cadence's P&R tool at the time. The iccad04 testcase contains a script that run's qplace and wroute.

This is a 6 layer testcase where the ram's are only blocked up to M3, and there is no power routing in the diearea.

Yes, density looks high by today's standards, but back then this must have been routable. It was part of the iccad04 testcases. I didn't make this thing up.

There is a huge discrepancy between RUDY and global route heat maps. The ram placement could very likely be improved upon, macro_placement needs to use halo's.

There is a general question whether heat map's should display the usage of "overall" routing resources or "available" routing resources. When you have ram's blocking M1/M2/M3 but M4/M5/M6 are available, there are 2 layers less resources then in the stdcell areas, but if M4/M5/M6 routing resources are only used sparingly, the ram areas shouldn't show up in dark red, no ?

stefanottili commented 5 months ago

Using the OpenRoad gui to visualize the routing tracks, it seems as if metal2/metal3 tracks are missing.

Looking at def/risc2.def.gz, they're there. Maybe OpenRoad doesn't read this "old" notation ?

   285 TRACKS Y -479400 DO 2399 STEP 400 LAYER metal1 metal2 ; 
   286 TRACKS Y -479400 DO 2399 STEP 400 LAYER metal2 metal3 metal4 ; 
   287 TRACKS Y -479400 DO 1200 STEP 800 LAYER metal4 metal5 metal6 ;
   288 TRACKS X -479400 DO 2399 STEP 400 LAYER metal1 metal2 metal3 ; 
   289 TRACKS X -479400 DO 2399 STEP 400 LAYER metal3 metal4 metal5 ;
   290 TRACKS X -479400 DO 1200 STEP 800 LAYER metal5 metal6 ;
   291 GCELLGRID Y -479600 DO 240 STEP 4000 ;
   292 GCELLGRID Y 480000 DO 1 STEP 0 ;
   293 GCELLGRID X -479600 DO 240 STEP 4000 ;
   294 GCELLGRID X 480000 DO 1 STEP 0 ;

stefanottili commented 5 months ago

I stand corrected.

a) OpenRoad reads the tracks just fine. I just didn't zoom in far enough to see them. The M2/M3 are not visible unless I zoom in to less than the height of a single stdcell whereas M1/M4/M5/M6 are brighter and visible at larger zoom factors.

b) The 4 large ram's are blocked up to M5, so only vertical M6 routing is available over them. Which means that the RUDY map looks as if it might correctly show the usage of available resources. But the discrepancy between RUDY and groute map is still suspicious to me.

And lastly, the sedsm script reads lef/def, does macro placement, stdcell placement, routing and defout.

stefanottili commented 5 months ago

global_route --allow_congestion

yupp, if global_route is correct, > 150% on all layers, this won't route at all.

But then why does RUDY shows the stdcell area's as being routable ?

[WARNING GRT-0300] Timing is not available, setting critical nets percentage to 0.
[INFO GRT-0020] Min routing layer: metal1
[INFO GRT-0021] Max routing layer: metal6
[INFO GRT-0022] Global adjustment: 0%
[INFO GRT-0023] Grid origin: (0, 0)  
[INFO GRT-0043] No OR_DEFAULT vias defined.
[INFO GRT-0088] Layer metal1  Track-Pitch = 0.4000  line-2-Via Pitch: 2.1200
[INFO GRT-0088] Layer metal2  Track-Pitch = 0.4000  line-2-Via Pitch: 2.1200
[INFO GRT-0088] Layer metal3  Track-Pitch = 0.4000  line-2-Via Pitch: 2.1200
[INFO GRT-0088] Layer metal4  Track-Pitch = 0.4000  line-2-Via Pitch: 2.2200
[INFO GRT-0088] Layer metal5  Track-Pitch = 0.8000  line-2-Via Pitch: 2.3200
[INFO GRT-0088] Layer metal6  Track-Pitch = 0.8000  line-2-Via Pitch: 2.3200
[INFO GRT-0019] Found 0 clock nets.
[INFO GRT-0001] Minimum degree: 2
[INFO GRT-0002] Maximum degree: 6963 
[INFO GRT-0003] Macros: 7
[INFO GRT-0043] No OR_DEFAULT vias defined.
[INFO GRT-0004] Blockages: 776492

[INFO GRT-0053] Routing resources analysis:
          Routing      Original      Derated      Resource
Layer     Direction    Resources     Resources    Reduction (%)
---------------------------------------------------------------
metal1     Horizontal      54855          5081          90.74%
metal2     Vertical        54855         34481          37.14%
metal3     Horizontal      54855         34710          36.72%
metal4     Vertical        54855         35947          34.47%
metal5     Horizontal      52470         35388          32.56%
metal6     Vertical        52470         35170          32.97%
---------------------------------------------------------------

[INFO GRT-0101] Running extra iterations to remove overflow.
[WARNING GRT-0170] Net lx1/lbc1/LBC1/data/n85: Invalid index for position (-362600, 129400). Net degree: 104.
[WARNING GRT-0153] Net lx1/lbc1/LBC1/data/n85 has errors during updateRouteType2.
[INFO GRT-0103] Extra Run for hard benchmark.
[INFO GRT-0197] Via related to pin nodes: 331871
[INFO GRT-0198] Via related Steiner nodes: 17164
[INFO GRT-0199] Via filling finished.
[INFO GRT-0111] Final number of vias: 511783
[INFO GRT-0112] Final usage 3D: 1886470
[WARNING GRT-0115] Global routing finished with overflow.

[INFO GRT-0096] Final congestion report:
Layer         Resource        Demand        Usage (%)    Max H / Max V / Total Overflow
---------------------------------------------------------------------------------------
metal1            5081         53033         1043.75%            12 /  5 / 49443
metal2           34481         84001          243.62%             4 / 12 / 51518
metal3           34710         52834          152.22%            10 /  1 / 19799
metal4           35947         57522          160.02%             2 /  6 / 23561
metal5           35388         55738          157.51%             7 /  2 / 21266
metal6           35170         47993          136.46%             2 /  6 / 14417
---------------------------------------------------------------------------------------
Total           180777        351121          194.23%            37 / 32 / 180004

[INFO GRT-0018] Total wirelength: 3024996 um
[INFO GRT-0014] Routed nets: 33760

maliberty commented 5 months ago

Let's start at the beginning. If I just load the lef & def provided I see things are already place somewhat strangely:

Is this intended? If the design is already placed, why run placement again?

stefanottili commented 5 months ago

The web pages with regard to the 2004 iccad contest seem to be lost in time.

All I can say is that this testcase came with lef/def and a sedsm command file. No result/log files, so I can only assume that the diearea/pin placement worked.

More than 90% utilization for a 9 track M1 rail stdcell lib with 6 layers of metal is definitely pushing what’s doable, or just not possible. In that case the def is bogus and one needs a different diesize.

The def stdcell placement is obviously illegal and the default macro placement clearly not good either. An mpl placement puts the large M5 OBS ram on the right edge, but they need a halo as to not block the M3 IO access and avoid stdcells underneath their power ring. And a lot of spacing to allow access to the pins on their bottom.

My main question is why even with bad macro placement RUDY is looking routable, but completely different from the groute map, which is showing that the placement is completely unroutable.

The sedsm script does run

read lef
read def
automatic macro placement
stdcell placement
routing
defout

maliberty commented 5 months ago

Where did you get this data?

maliberty commented 5 months ago

I agree the rudy diff needs looking at.

stefanottili commented 5 months ago

I probably downloaded the data at the time of the contest.

stefanottili commented 5 months ago

Hi Augusto,

I had to modify lef/risc2.lef.gz, changing the 3 ram's CLASS RING to CLASS BLOCK.

15531:  CLASS BLOCK ;
19167:  CLASS BLOCK ;
21165:  CLASS BLOCK ;

The first pictures are from "macro_placement", which moved the ram's to the right.

global_placement will move the ram's too, so I've changed their status to + FIXED in def/risc2.def.gz to keep their original placement. This is shown with the second set of pictures.

32912:- ICACHE_INST0/SRAM   tsyncram_512x32 +   FIXED   (   -304400 148000  )   N   ;
32913:- ICACHE_TAG0/SRAM    rf_128x22   +   FIXED   (   -119240 -42800  )   N   ;
32914:- DCACHE_DATA/SRAM    tsyncram_512x32 +   FIXED   (   -169000 -467600 )   N   ;
32915:- DCACHE_TAG/SRAM rf_128x22   +   FIXED   (   -15720  -42800  )   N   ;
32916:- DRAM_DATA/SRAM  tsyncram_512x32 +   FIXED   (   87800   148000  )   N   ;
32917:- IRAM_DATA/SRAM  tsyncram_512x32 +   FIXED   (   87800   -39200  )   N   ;
32918:- IRAM_VALID/SRAM rf_8x32 +   FIXED   (   -201220 32800   )   N   ;

read_lef lef/risc2.lef.gz
read_def def/risc2.def.gz
#rtl_macro_placer
#macro_placement
global_placement -density 0.95
#detailed_placement
#global_route -verbose
#detailed_route

I started detailed route 10 hours ago with the + FIXED original ram placement and it's now at the 34th iteration with 1024 mostly Metal4 violations. It might be that RUDY is more indicative of the routeability of this design than groute.

Is there any way to interrupt detailed route and keep the routing/error markers for inspection ?

If not this leads to two more feature requests. I'll file them if you agree with them ;-) 1) update the view after each detailed route iteration with error markers to show the progress 2) create a big "stop" button in the gui to be able to stop detailed routing and keep the progress made so far.

With the original + PLACED macro's I've also tried a) rtl_macro_placer, which coredumps (likely due to missing verilog netlist) and b) macro_placement, which places all rams to the right (first two pics).

It looks to me as if global_placement might actually provide the best hint at macro placement, even though it placed two of the large rams at the bottom with the pins facing down, should flip the ram to get pins to the top. Well this failed with: [ERROR DRT-0416] Term A[2] contains offgrid pin shape. Pin shape ( 118121 -468168 ) ( 118721 -467568 ) is not a multiple of the manufacturing grid 10. [ERROR GUI-0070] DRT-0416

maliberty commented 5 months ago

detailed_route has -drc_report_iter_step to report periodically. ORFS sets it to 5 by default. Stop is the GUI is something I've planned to do for a long time so feel free to open an issue.

stefanottili commented 5 months ago

Well, the groute map is completely wrong, the rudy map looks valid. Looking at where detailed route will fail, I limited the # of iteration to 5,

Looking at the errors after the 5th iteration, groute rather goes thru M4 OBS both with horizontal and vertical routing then using the available M6 (the large ram's have OBS from M1-M5).

If the global router see's everything as equally congested, it has limited choices of what to do.

1) The global placer must have some notion of routing congestion, is there any way to visualize that ? 2) Is there a way to increase the macro OBS violation cost for groute ?

Keep in mind that after ~10 hours of detailed route, the # of M4 shorts was down to ~1000.

global_route -allow_congestion
detailed_route -drc_report_iter_step 1 -droute_end_iter 5

[INFO GRT-0096] Final congestion report:
Layer         Resource        Demand        Usage (%)    Max H / Max V / Total Overflow
---------------------------------------------------------------------------------------
metal1            4954         61139         1234.13%            15 /  4 / 57518
metal2           34480         92459          268.15%             4 / 12 / 59581
metal3           34704         53684          154.69%             8 /  2 / 20594
metal4           35947         59531          165.61%             2 / 10 / 25335
metal5           35388         57378          162.14%             7 /  2 / 22762
metal6           35170         48978          139.26%             2 /  5 / 15318
---------------------------------------------------------------------------------------
Total           180643        373169          206.58%            38 / 35 / 201108

[INFO DRT-0195] Start 5th optimization iteration.
...
[INFO DRT-0199]   Number of violations = 33443.
Viol/Layer      metal1     V1 metal2     V2 metal3     V3 metal4     VL metal5     VQ metal6
Cut Spacing          0    223      0     38      0     13      0     22      0      0      0
Metal Spacing       15      0    443      0     35      0     71      0     15      0     13
NS Metal             0      0      0      0      1      0      4      0      0      0      0
Recheck              0      0      0      0      0      0     10      0      0      0      6
Short              106      1   1424      8    320      3  29499     98    329     10    729
SpacingRange         0      0      2      0      2      0      0      0      3      0      0
[INFO DRT-0267] cpu time = 00:26:09, elapsed time = 00:05:30, memory = 4918.47 (MB), peak = 4998.53 (MB)
Total wire length = 2334478 um.
Total wire length on LAYER metal1 = 33168 um.
Total wire length on LAYER metal2 = 546802 um.
Total wire length on LAYER metal3 = 671212 um.
Total wire length on LAYER metal4 = 592058 um.
Total wire length on LAYER metal5 = 269319 um.
Total wire length on LAYER metal6 = 221917 um.
Total number of vias = 367893.

stefanottili commented 5 months ago

This starts with a bad macro placement. Which in turn gives the global placement problems. There doesn't seem to be a way to visualize it's view of congestion. The global router congestion calculation seems off, especially with regard to large rams obstructing M1-M5. And the detailed router then gives it all to fix the mess the previous steps caused.

stefanottili commented 5 months ago

Ok, ... let's take the macro's out of the equation. (It's still a good mpl/global_placement testcase).

This way it’s a mostly "why is the groute map off by so much" ?

Let's also move the metal2 pins to metal4, since the global placement puts stdcells right beside the pins and this can't be routed later. (I still would want gpl's view of congestion to be visualized, a "placement congestion routing map")

risc2.noram.def.gz

[ERROR GRT-0118] Routing congestion too high. Check the congestion heatmap in the GUI.
[ERROR GUI-0070] Error: risc2.or, 9 GRT-0118

Rudy doesn't seem to correctly display congestion at the top three rows and on the right at a similar distance.

[INFO DRT-0195] Start 4th optimization iteration.
    Completing 10% with 275 violations.
...
[INFO DRT-0199]   Number of violations = 160.
Viol/Layer          V1 metal2
Cut Spacing        159      0
Short                0      1

it took detailed route another 30 iterations to get rid of these V1 cut spacing errors, half of the time iterating on the remaining 8 violations. Better pin access should be able to avoid all these iterations.

So groute map is wrong, without macros this should be easily routable.

gudeh commented 5 months ago

Hi @stefanottili, I have been examining the internal rudy variables used during calculations for your first test case, with macros, and I couldn't find nothing unexpected.

This is the test cases I was having a look at:

What do you mean exactly with "placement congestion routing map"? Do you want the exact routing congestion during placement stage? I understand that the idea is to have RUDY estimate it, so we do not have to run the costly grt (as we used to previously), and not have an exact routing congestion. You should be able to do so using "routability_use_grt" parameter during global_placement, and using grt instead of rudy during routability in gpl will enable the exact routing congestion by grt.

On your new test case without macros you said: "Rudy doesn't seem to correctly display congestion at the top three rows and on the right at a similar distance." Do you mean the region close to the pins? If you could please circle the region with an image editor (gimp), to be clear, I would really appreciate. Either way, this might be only due to RUDY's expected imprecision.

gudeh commented 5 months ago

The difference between grt and rudy still look awkward. @eder-matheus, do you think this might be because of the die coordinates origin with negative values, from #5284 ? I see nothing extraordinary during rudy calculations.

stefanottili commented 5 months ago

risc2.noram.def.gz

The RUDY map has a gap of 2 "blocks" on the right/top, whereas on left/bottom it goes right up to the die area.

eder-matheus commented 5 months ago

risc2.noram.def.gz

The RUDY map has a gap of 2 "blocks" on the right/top, whereas on left/bottom it goes right up to the die area.

The last line/row of the grid is greater than the other ones for die areas that aren't multiples of the grid size. So the gap you see is because these regions have more resources than the others, leading to less congestion.

stefanottili commented 5 months ago

Hmm, something doesn't quite look right to me.

Next to the pins on the top there is a pretty uniform stdcell density and you have the additional resource requirement of the metal4 pin connection, but the rudy map goes from a rows of “green, yellow, orange and red” congestion indicators to a by the looks of it two row hight of “no congestion to see here at all”.

DIEAREA ( -479600 -479600 ) ( 480000 480000 ) ; GCELLGRID Y -479600 DO 240 STEP 4000 ;

240 * 4000 - 479600 = 480400, so the last Y GCELL is slightly too tall for the top diearea. Is it coorect to assume that there GCELL's on top is of size 4000x7600 ?

So this box is bigger then the one below, but it will have "the same density of routing requirements"

Is there a way to dump the rudy box + it's density info into a text file to be able check the coordinates and requirements ?

stefanottili commented 5 months ago

if anything should be red then it should be the area right below the pins on the top.

Way more routing in that are then the red area's right below.

eder-matheus commented 5 months ago

Hmm, something doesn't quite look right to me.

Next to the pins on the top there is a pretty uniform stdcell density and you have the additional resource requirement of the metal4 pin connection, but the rudy map goes from a rows of “green, yellow, orange and red” congestion indicators to a by the looks of it two row hight of “no congestion to see here at all”.

DIEAREA ( -479600 -479600 ) ( 480000 480000 ) ; GCELLGRID Y -479600 DO 240 STEP 4000 ;

240 * 4000 - 479600 = 480400, so the last Y GCELL is slightly too tall for the top diearea. Is it coorect to assume that there GCELL's on top is of size 4000x7600 ?

So this box is bigger then the one below, but it will have "the same density of routing requirements"

Is there a way to dump the rudy box + it's density info into a text file to be able check the coordinates and requirements ?

I understand your point, and I think RUDY might have a bug. You can check the density value in the GUI; double-click on the "Estimated Congestion (RUDY)" option and it will show some settings. "Show Numbers" will draw the density values in each grid tile, and in these larger tiles, the density is zero. Some further investigation of the RUDY code will be required to understand why this is happening.

Here's a figure with the density data. The zeros there are definitely wrong.

stefanottili commented 5 months ago

Thanks for confirming.

By the looks of it, groute suffers from something similar.

And now one more datapoint with regard to rudy/groute maps:

I've saved the routed.def, started a new openroad session, read_lef/read_def routed.def/global_route and then display both groute and rudy maps. They look very different from the groute/rudy maps at the end of routing ...

stefanottili commented 5 months ago

and they also look very different when reading routed.db and displaying rudy/groute map.

stefanottili commented 5 months ago

and for the last experiment of the day, I've moved the centered floorpan to have 0,0 at the lower left. risc2.first_quadrant.noram.def.gz

1) the rudy map gaps on the right/top don't look so pronounced any more but are still clearly visible 2) the weird vertical rudy congestion below the top pins is gone 3) Global route spreads out the routing much more, but it's congestion map is still dark red 4) Detail route still wastes 30 iteration on fixing V1 cut spacing errors

a) Both global route and rudy behave differently when the floorpan is centered vs. the left/bottom is at 0,0 b) Both rudy and global route map violently disagree about the amount of congestion.

stefanottili commented 3 months ago

risc2.first_quadrant.noram.def.gz took out the issue of macro placement + 95% utilization and the issue of OR not handling centered floorplans.

As of the 10th Aug 2024, running the RISC2 FARADAY_ICCAD04Bench led to a couple of bug fixes, but the main problems remain.

This TSMC 180 4+2 layer testcase routes in 10 iterations, despite the GRT's resources computation being completely off and showing an unroutable design.

ispd24 with a metal stack using different layer pitches doesn't show this behavior, so why is OR grt struggling with this tech ?

 131 [INFO GRT-0088] Layer metal1  Track-Pitch = 0.4000  line-2-Via Pitch: 2.1200
 132 [INFO GRT-0088] Layer metal2  Track-Pitch = 0.4000  line-2-Via Pitch: 2.1200
 133 [INFO GRT-0088] Layer metal3  Track-Pitch = 0.4000  line-2-Via Pitch: 2.1200
 134 [INFO GRT-0088] Layer metal4  Track-Pitch = 0.4000  line-2-Via Pitch: 2.2200
 135 [INFO GRT-0088] Layer metal5  Track-Pitch = 0.8000  line-2-Via Pitch: 2.3200
 136 [INFO GRT-0088] Layer metal6  Track-Pitch = 0.8000  line-2-Via Pitch: 2.3200

Using only 5 layer, placement densitiy to 0.8 and reducing the routing resources by 30% routes this design much quicker in just 6 drt iterations. The global route resource map remains off.

 164 [INFO GRT-0096] Final congestion report:
 165 Layer         Resource        Demand        Usage (%)    Max H / Max V / Total Overflow
 166 ---------------------------------------------------------------------------------------
 167 metal1           10338         60369          583.95%            12 /  2 / 54780
 168 metal2           28124         96723          343.92%             2 / 13 / 71862
 169 metal3           28124         36713          130.54%             6 /  2 / 12886
 170 metal4           28124         41323          146.93%             2 /  6 / 16726
 171 metal5           26386         31055          117.69%             4 /  1 / 8855
 172 ---------------------------------------------------------------------------------------
 173 Total           121096        266183          219.81%            26 / 24 / 165109

read_lef lef/risc2.lef.gz
#read_def def/risc2.def.gz
read_def def/risc2.first_quadrant.noram.def.gz
set_routing_layers -signal metal1-metal5
set_global_routing_layer_adjustment * 0.3
#macro_placement
#global_placement -density 0.95
global_placement -density 0.8
detailed_placement
global_route -allow_congestion -verbose
detailed_route -drc_report_iter_step 1

maliberty commented 3 months ago

The centered die use case is a very low priority as OR never generates such.

stefanottili commented 3 months ago

this is not about the centered case any more, the last pictures/groute overflow numbers are with "...moved the centered floorpan to have 0,0 at the lower left" DIEAREA ( 0 0 ) ( 959600 959600 ) ;

eder-matheus commented 3 months ago

It's hard to follow the new problem with many comments here. Could you attach a new reproducible with all the files and scripts needed to run the problem? If the problem is not about the centered case, perhaps creating a separate issue would make sense.

stefanottili commented 3 months ago

see https://github.com/The-OpenROAD-Project/OpenROAD/issues/5548 for the "first_quadrant.noram" testcase.

stefanottili commented 3 months ago

closed and replaced by https://github.com/The-OpenROAD-Project/OpenROAD/issues/5557

The-OpenROAD-Project / OpenROAD