The-OpenROAD-Project / OpenROAD

OpenROAD's unified application implementing an RTL-to-GDS Flow. Documentation at https://openroad.readthedocs.io/en/latest/
https://theopenroadproject.org/
BSD 3-Clause "New" or "Revised" License
1.42k stars 496 forks source link

detailed route spends 30 iteration on fixing trivial pin access. #5307

Open stefanottili opened 3 days ago

stefanottili commented 3 days ago

Describe the bug

Fallout from https://github.com/The-OpenROAD-Project/OpenROAD/issues/5300

All but V1 cut spacing violations were solved after the 4th iteration.

It took detailed routing 30 iterations to put all V1 on the metal1/metal2 track intersection. Why aren't the via placed at this violation free location to start with ?

Is this an issue with pin access computation ?

Expected Behavior

Put V1 on violation free metal1/metal2 track intersection.

Environment

M1 Mac
OpenROAD v2.0-14343-g6dfa697a1  (Jun 29 2024)
Features included (+) or not (-): +Charts +GPU +GUI +Python

To Reproduce

use risc2.noram.def.gz from https://github.com/The-OpenROAD-Project/OpenROAD/issues/5300

Relevant log output

No response

Screenshots

Screenshot 2024-06-29 at 5 08 48 PM

Additional Context

No response

osamahammad21 commented 2 days ago

Could you please provide all the steps to reproduce alongside the files? I see you've attached the def file but not the guides file resulting from global routing? Also I see unsupported via rules by DRT like:

VIARULE TURNmetal1 GENERATE
    LAYER metal1 ;
        DIRECTION vertical ;

    LAYER metal1 ;
        DIRECTION horizontal ;
END TURNmetal1

DRT errors when reading such a rule so it doesn't even start routing, but I assume you have a different version of the lef since you've passed that step.

stefanottili commented 2 days ago

I'm sorry, my bad. I forgot that I had to comment out the TURN rules in lef. risc2.lef.gz

I ran the script mentioned in #5300, which doesn't generate a guide file by default.

read_lef lef/risc2.lef.gz
read_def def/risc2.noram.def.gz
global_placement
detailed_placement -disallow_one_site_gap
global_route -verbose
detailed_route -drc_report_iter_step 1

I just tried to generate a guide file by using "global_route -guide_file guide", but this didn't write a file. Maybe because it errors out ?

global_route -guide_file guide
[WARNING GRT-0300] Timing is not available, setting critical nets percentage to 0.
[ERROR GRT-0118] Routing congestion too high. Check the congestion heatmap in the GUI.
[ERROR GUI-0070] GRT-0118

I have to start detailed_route "by hand" because OpenRoad stops after the global_route error.

osamahammad21 commented 1 day ago

Then that means you're using the detailed router's global routing. We are not working currently on supporting TritonRoute's WXL global router. So I would recommend fixing the issues with the global routing step before starting detailed routing. I will close this issue now. After https://github.com/The-OpenROAD-Project/OpenROAD/issues/5300 is resolved and you are able to run global routing correctly, try detailed routing. If the issue still persists, please feel free to reopen.

maliberty commented 1 day ago

@osamahammad21 he mentions that he is running global_route in his script so the guides will be in the db I think.

osamahammad21 commented 1 day ago

@maliberty Does GRT writes the guides to the db even if it produces an error? Edit After looking it doesn't. saveGuides is called after checkOverflow which produces the error. So in this case, he is using DRT's global routing

maliberty commented 1 day ago

No, if grt errors out then this isn't a drt issue. -guide_file guide shouldn't cause grt to fail if it passes otherwise. @stefanottili are you able to pass grt in any case?

eder-matheus commented 1 day ago

@maliberty @stefanottili @osamahammad21 You can use the -allow_congestion flag to generate route guides even when the groute ends with congestion. The error message will not happen. Instead, a warning message about the congestion is generated.

stefanottili commented 1 day ago

Thanks for the hint, I was not aware that when global route errors out I would be running detailed route with its "internal" global router.

global_route -verbose -guide_file guide -allow_congestion

keeps the script running and continues from global route to detailed route.

I would argue that -allow_congestion should be the default behavior. I've seen detail routers resolve groute overflow issues, so I always run detailed route after groute.

a) previously I had 2 of these drt warnings, now there are 305 of them

[WARNING DRT-0225] CE1_SEL_E_R 1 pin not visited, fall back to feedthrough mode.
[WARNING DRT-0225] JTAG_TMS 1 pin not visited, fall back to feedthrough mode.
[WARNING DRT-0225] RESET_D1_R_N 23 pin not visited, fall back to feedthrough mode.
[WARNING DRT-0225] BUSCLKF 2 pin not visited, fall back to feedthrough mode.

b) after the 4th detailed route iteration, there are mostly V1 cut spacing violations left, more then before.

 834 [INFO DRT-0195] Start 4th optimization iteration.
...
 856 Viol/Layer      metal1     V1 metal2     V2 metal3 metal4
 857 Cut Spacing          0    191      0      1      0      0
 858 Metal Spacing        1      0      4      0      0      0
 859 Short                0      0     21      0      6      1

After the 36th iteration, all pin access errors are solved by putting the V1 on the pin's metal1/metal2 track intersection. So this issue didn't change by going from "internal detailed router global route" to using global route guides.

The error markers from the 5th iteration show that the V1 didn't start on pin access points. It seems to me putting them there to start with could save a lot of lengthy routing iterations.

Screenshot 2024-07-01 at 12 24 26 PM

c) the groute congestion report and the groute map display with a centered die area are completely wrong.

This design is 100% routable. (It was routable even with macros in 2004)

d) don't assume die area left/bottom to be 0,0, it could be negative or positive

Having a chip centered was a tapeout requirement and it was default behavior of "init floorpan" to center die area. I've also done blocks with the left/bottom corner with a positive offset because we were "pressing down" full chip power and pin's doing floorpans for hierarchical chips. We just used global coordinates in the hierarchy below.

I would argue that assuming that the left/bottom of the die area is 0,0 is a bug.

stefanottili commented 1 day ago

By the looks of the error markers, it seems as if the V1 for pin access would be initially placed into the "center" of the pin, not on the pin access point.

In all cases I checked, getting rid of the V1 cut spacing errors was eventually achieved by putting them on the pin access point that the gui displays a X

Please reopen this case.

maliberty commented 1 day ago

0,0 is not a bug and every design taped out with OR uses it. We should allow negative coordinates (though not a high priority).

Many designs will never finish in drt if they fail in grt so making it the default is not desirable. It just slows down the process of isolating the issue.

This design may have been routable in 2004 (though we have no existence proof) but that doesn't mean the issue is in drt. It could well be that it had a more routable placement. It is necessary to identify where the failure is rooted.

stefanottili commented 1 day ago

drt routes this design despite the fact that groute map display/congestion warnings seems to have a bug.

By the looks of it either the guides are correct or the detailed router groute put's things in to place anyways.

This bug report is about the 30 drt iterations that are wasted runtime because the V1 put on pin's seems to be on the center of the pin initially. Eventually drt puts them at the metal1/metal2 track intersection that's marked as pin access point.

maliberty commented 1 day ago

The option is there if you wish to use it. grt isn't a perfect predictor of drt but expecting it to be isn't realistic.

I'll leave the question of V1 to @osamahammad21

stefanottili commented 1 day ago

If drt would be quick in "fixing" the V1 cut spacing errors, there wouldn't be a bug report.

But it takes only a couple of minutes of routing time to get to the 4th iteration with 191 V1 cut spacing errors left and then another 30 painfully slow iterations to fix these.

There is clearly room for a significant runtime improvement.

11:35 -1.rpt
11:38 -2.rpt
11:40 -3.rpt
11:42 -4.rpt

11:44 -5.rpt
...
12:06 -36.rpt
eder-matheus commented 1 day ago

c) the groute congestion report and the groute map display with a centered die area are completely wrong.

@stefanottili Could you detail what is completely wrong with the congestion report?

stefanottili commented 1 day ago

@eder-matheus please have a look at the last two pics in https://github.com/The-OpenROAD-Project/OpenROAD/issues/5300

groute claims an unroutable design with 206.58% routing resource usage, rudy show everything is fine and detail route finishes without any violations.

It takes ~4min of runtime to get past groute using the lef/def attached to this test case.