Closed oharboe closed 6 days ago
Hi @oharboe, I was unable to execute the ./run-me of your generated issue tar:
./run-me-BoomTile-asap7-base.sh OpenROAD v2.0-13318-g19635e967 Features included (+) or not (-): +Charts +GPU +GUI +MPL2 +PAR +Python This program is licensed under the BSD-3 license. See the LICENSE file for details. Components of this program may be licensed under more restrictive licenses which must be honored. Error: read_liberty.tcl, 25 cannot read file /home/oyvind/.cache/bazel/_bazel_oyvind/7e6ad621f3f951c3ee6f5b179289b54e/execroot/_main/bazel-out/k8-fastbuild/bin/results/asap7/l2_tlb_ram_0_512x46/base/l2_tlb_ram_0_512x46.lib. openroad>
Its a packaging error - it should be ./home not /home in var*sh
Furthermore, looking at the logs I see an unusual behavior from the GPL, it stopped with 0.781807 overflow, it usually stops with 0.10 or less. And routability mode was not activated. I understand this can happen when density is too high.
how many iterations?
Its a packaging error - it should be ./home not /home in var*sh
It was created with make global_place_issue
. So that script needs a tweak...
It stopped on iteration 280. Here are the last ones:
[NesterovSolve] Iter: 240 overflow: 0.869161 HPWL: 12149035473 [NesterovSolve] Iter: 250 overflow: 0.849265 HPWL: 13253550712 [NesterovSolve] Iter: 260 overflow: 0.827114 HPWL: 14365264704 [NesterovSolve] Iter: 270 overflow: 0.805211 HPWL: 15373704557 [NesterovSolve] Iter: 280 overflow: 0.781807 HPWL: 16242167774
In my run I see
[NesterovSolve] Iter: 1370 overflow: 0.235532 HPWL: 19759299769
[NesterovSolve] Iter: 1380 overflow: 0.209551 HPWL: 19286141404
[INFO GPL-0075] Routability numCall: 8 inflationIterCnt: 3 bloatIterCnt: 1
and still running. I wonder if the log provided was from an incomplete run.
I suspect the extra time comes from these
[NesterovSolve] Revert back to snapshot coordi
I ran the issue and it also reached iteration 1380 after some hours, it may be stuck there, but it is still running.
I also ran the issue without routability and it finished GPL on iteration 500. So @oharboe, you can consider turning off routability mode in GPL. Or increasing the target RC parameter.
We recently adjusted the default target routing congestion for GPL routability mode (from 1.25 to 1.00), which now causes this design to activate routability mode, previously it wouldn’t activate. However, this change significantly extend the completion time.
I am curious about the implications for DRT runtime in both scenarios, since routability is actually able to reduce routing congestion from 1.15 to 1.05 at least until iteration 1380.
@gudeh Silly question: what parameters exactly should I adjust in ORFS?
To turn off routability you can comment out lines 32 to 37 in /flow/scripts/global_place.tcl
. Or if you wish to turn off routability mode only on this design you can put export GPL_ROUTABILITY_DRIVEN = 0
in the config.mk file of the design.
What about the "RC" parameter, what is that?
If routability is activated, the global placer will try to improve the routing congestion during placement, it inflates the cells to do so. The target RC is the target routing congestion it attempts to reach during this process. Every time it does not reach the desired target RC it tries again, starting from the [NesterovSolve] Revert back to snapshot coordi
. It keeps trying to do so if it notices the final RC is decreasing.
Under your situation, I would suggest changing the target RC to 1.10, since it quickly reaches 1.09:
[INFO GPL-0074] FinalRC: 1.096847
, right after iteration 580. This way you can improve the routability for DRT without paying too much extra runtime during GPL. You can do so by adding this to your config.mk: export GPL_TARGET_RC = 1.10
@gudeh Will try. Perhaps this github issue can be put to bed if the progress messages are improved to include the advice you have above?
The user experience is then that routing takes a long time, the user looks at the logs where some advice on adjusted settings to rein in runtimes is found...
Sorry @oharboe, I did not understand what you mean with:
The user experience is then that routing takes a long time, the user looks at the logs where some advice on adjusted settings to rein in runtimes is found...
@maliberty @gudeh The problem for the user is that global routing runs "forever" here. The fix is to adjust the parameters to global route. So what is to "solve" the feature request in this issue is to improve the user experience, not to a change to global routing as such. If I understand correctly.
I think that the user experience could be improved to the point that this issue is "fixed" if the progress messages in global route included advice on how to adjust global routing parameters.
The target RC is a GPL parameter actually. With a higher target RC value, the GPL should call the global router less frequently.
Furthermore, we are in the process of substituting the global router used during GPL routability, going from fastroute to rudy, which is much faster.
Either way, I can try to improve the log messages during GPL, if that is your suggestion.
@gudeh do you know where the congestion is that routability isn't able to resolve? It seems we are stuck in loop that becomes mostly futile after the first few iterations.
Either way, I can try to improve the log messages during GPL, if that is your suggestion.
That's my idea and understanding. I'd like to hear what others think...
@maliberty @gudeh Do you need any further input from me? It looks like I would just be in the way and create long turnaround times if you try to instruct me to run experiments, I misunderstand and then you try to interpret my slightly off experiments... It is probably easier and faster for you to run your own experiments with options?
@gudeh I see on both this design and ariane/gf12 thin bands of congestion right after the routability iteration, eg
They seem solvable as the surrounding area is not congested. I'm wondering if
grouter_->setOverflowIterations(0);
may be too conservative. Perhaps you can experiment
@oharboe could you please send the config.mk file? I could not find it on the make issue tar.
@oharboe could you please send the config.mk file? I could not find it on the make issue tar.
This is from bazel-orfs, so it is a bit of a work in progress. Could you make do with vars-*.sh file that is in the .tar.gz file for now?
I wanted the file so I could go forward with the flow and check the runtime during DRT. @maliberty is there anyway to do that without the config.mk? Or should I just build a config.mk manually?
With GPL_ROUTABILITY_DRIVEN=0, the running times for megaboom are:
Log | Elapsed seconds | Percent Complete |
---|---|---|
1_1_yosys | 3965 | 8 |
1_1_yosys_hier_report | 3615 | 15 |
2_1_floorplan | 132 | 15 |
2_2_floorplan_io | 12 | 15 |
2_4_floorplan_macro | 676 | 16 |
2_5_floorplan_tapcell | 546 | 17 |
2_6_floorplan_pdn | 320 | 17 |
3_1_place_gp_skip_io | 711 | 18 |
3_2_place_iop | 27 | 18 |
3_3_place_gp | 6143 | 32 |
3_4_place_resized | 583 | 32 |
3_5_place_dp | 1177 | 33 |
4_1_cts | 449 | 34 |
5_1_grt | 3249 | 41 |
5_2_fillcell | 77 | 41 |
5_3_route | 17573 | 71 |
6_1_merge | 383 | 72 |
6_report | 6416 | 85 |
generate_abstract | 863 | 86 |
Total | 46917 | 100 |
@oharboe I think the pdn stripe is too close to the macro. Could you modify your pdn strategy to include -halo {2.0 2.0 2.0 2.0}
on the define_pdn_grid for the macros and re-run?
I'll look at making this change to asap7 in general.
@oharboe I think the pdn stripe is too close to the macro. Could you modify your pdn strategy to include
-halo {2.0 2.0 2.0 2.0}
on the define_pdn_grid for the macros and re-run?I'll look at making this change to asap7 in general.
Is this a case where global routing or pdn could add an actionable warning/error/progress message or is a case of trivial to see after 30 hears of ASIC experience? :-)
I'll give it a go:
$ git diff
diff --git a/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl b/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl
index 2a95094e..4f91331e 100644
--- a/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl
+++ b/flow/platforms/asap7/openRoad/pdn/BLOCKS_grid_strategy.tcl
@@ -30,6 +30,6 @@ add_pdn_connect -grid {top} -layers {M5 M6}
# Element grid
####################################
# The halo around the macro prevents pdn from blocking pin access
-define_pdn_grid -macro -cells $::env(MACROS) -halo "0.25 0.25 0.25 0.25" -voltage_domains {CORE} -name ElementGrid
+define_pdn_grid -macro -cells $::env(MACROS) -halo "2.0 2.0 2.0 2.0" -voltage_domains {CORE} -name ElementGrid
Once you look at the congestion map you can see all the congestion is right around these stripes. I'm not sure there is an simple way to detect and make a message out of that.
Once you look at the congestion map you can see all the congestion is right around these stripes. I'm not sure there is an simple way to detect and make a message out of that.
Maybe a more general progress message that this isn't converging in a normal amount of time?
I'm haggling here for what is practical and makes sense in terms of actionable feedback that helps to educate the user...
I'm not sure what action to suggest. I didn't know until I dug into it more.
@maliberty Please try to run this, it should have the updated halo for PDN. Only top level(BoomTile), I didn't redo the macros. https://drive.google.com/file/d/1Ri9YtRqJnGa2zVGIYe1b5KTOQ6Ru7TA_/view?usp=sharing
@maliberty Ran out of memory after a while on my laptop:
[NesterovSolve] Iter: 430 overflow: 0.234661 HPWL: 21465138028
[NesterovSolve] Iter: 440 overflow: 0.212438 HPWL: 20700377471
[INFO GPL-0100] worst slack 5.28e-10
[INFO GPL-0103] Weighted 177853 nets.
[INFO GPL-0075] Routability numCall: 1 inflationIterCnt: 1 bloatIterCnt: 0
./run-me-BoomTile-asap7-base.sh: line 7: 2076448 Killed openroad -no_init ${SCRIPTS_DIR}/global_place.tcl
oyvind@small-cigar:~/megaboom/bar$ echo $?
137
137 is the exit code for running out of of memory.
@oharboe running the files you sent last, GPL converged until the end. Although it still took a lot of iterations (2660):
Your test case packaging still has issues. In var*sh full paths are used:
export OBJECTS_DIR="/home/oyvind/.cache/bazel/_bazel_oyvind/7e6ad621f3f951c3ee6f5b179289b54e/execroot/_main/bazel-out/k8-fastbuild/bin/objects/asap7/BoomTile/base"
@maliberty I am aware. I am investigating a fix to bazel-orfs or ORFS. Stay tuned.
Hi @oharboe, we merged RUDY for routability mode today! You should not find yourself stuck on "roubatility numcall" messages anymore!
Fantastic! We are also using RUDY for fast turnaround heatmaps. Very nice feature!!!
That's great! Do you think we can close this issue? I see you have other ones about gpl messages also, I will try to modify them a little so messages are more clear.
Yes, I think we can close. I will open a new issue if I observe anything that merits further followup woork.
Description
22000 seconds to complete. Is there any low hanging fruit here?
make global_place_issue tar file https://drive.google.com/file/d/1KHxaEr9AFHwCJbsMKQFg5nv3Ndb2-bgw/view?usp=sharing
Suggested Solution
Have a look to see if there's anything easy that can be done here?
Additional Context
No response