The-OpenROAD-Project / megaboom

BSD 3-Clause "New" or "Revised" License
12 stars 3 forks source link

Out of Memory Error #65

Open thebyrd opened 3 days ago

thebyrd commented 3 days ago

During CPUHeavyGenrule //:BoomTile_route, openroad is being killed by the OOM killer. Which constraints can I change so this will consume less memory? I appreciate that this can be fixed by just getting more RAM, but my current machine has 64 GiB of RAM. Its a 20 core i7 running Ubuntu 22.04.4 LTS

megaboom$ bazel build BoomTile_final --verbose_failures
INFO: Analyzed target //:BoomTile_final (80 packages loaded, 1533 targets configured).
ERROR: /home/davidbyrd/Documents/src/github.com/The-OpenROAD-Project/megaboom/BUILD.bazel:832:15: Executing CPUHeavyGenrule //:BoomTile_route failed: (Exit 2): bash failed: error executing CPUHeavyGenrule command (from target //:BoomTile_route) 
  (cd /home/davidbyrd/.cache/bazel/_bazel_davidbyrd/6129d0cbed3c216c13ad15fa0aba969d/execroot/_main && \
  exec env - \
  /bin/bash -c '\
OR_IMAGE=openroad/flow-ubuntu22.04-builder:latest  DESIGN_CONFIG=bazel-out/k8-fastbuild/bin/BoomTile_config.mk  STAGE_CONFIG=bazel-out/k8-fastbuild/bin/BoomTile_route_config.mk  MAKE_PATTERN=external/bazel-orfs~override/route-bazel.mk  RULEDIR=bazel-out/k8-fastbuild/bin  bazel-out/k8-fastbuild/bin/external/bazel-orfs~override/docker_shell  make --silent bazel-route elapsed')
# Configuration: 6220210384caac83f2686a71e0317d4d9a62c6a78f5b323d7478d3ae7e7a8bf9
# Execution platform: @@local_config_platform//:host
make[1]: *** [/OpenROAD-flow-scripts/flow//Makefile:774: do-5_3_route] Error 137
make: *** [/home/davidbyrd/.cache/bazel/_bazel_davidbyrd/6129d0cbed3c216c13ad15fa0aba969d/external/bazel-orfs~override/route-bazel.mk:10: bazel-route] Error 2
[INFO-FLOW] ASU ASAP7 - version 2
Default PVT selection: BC
grep -q 1 /home/davidbyrd/.cache/bazel/_bazel_davidbyrd/6129d0cbed3c216c13ad15fa0aba969d/execroot/_main/bazel-out/k8-fastbuild/bin/results/asap7/BoomTile/base/grt.ok
Running fillcell.tcl, stage 5_2_fillcell
[WARNING STA-0450] virtual clock clock_vir can not be propagated.
[INFO DPL-0001] Placed 7135933 filler instances.
Elapsed time: 1:00.72[h:]min:sec. CPU time: user 52.00 sys 8.42 (99%). Peak memory: 13242068KB.
Running detail_route.tcl, stage 5_3_route
[WARNING STA-0450] virtual clock clock_vir can not be propagated.
[INFO ORD-0030] Using 20 thread(s).
detailed_route -output_drc /home/davidbyrd/.cache/bazel/_bazel_davidbyrd/6129d0cbed3c216c13ad15fa0aba969d/execroot/_main/bazel-out/k8-fastbuild/bin/reports/asap7/BoomTile/base/5_route_drc.rpt -output_maze /home/davidbyrd/.cache/bazel/_bazel_davidbyrd/6129d0cbed3c216c13ad15fa0aba969d/execroot/_main/bazel-out/k8-fastbuild/bin/results/asap7/BoomTile/base/maze.log -bottom_routing_layer M2 -top_routing_layer M9 -save_guide_updates -verbose 1 -drc_report_iter_step 5
[INFO DRT-0149] Reading tech and libs.
[WARNING DRT-0340] LEF58_ENCLOSURE EOL is not supported. Skipping for layer V3
[WARNING DRT-0340] LEF58_ENCLOSURE EOL is not supported. Skipping for layer V3
[WARNING DRT-0340] LEF58_ENCLOSURE EOL is not supported. Skipping for layer V4
[WARNING DRT-0340] LEF58_ENCLOSURE EOL is not supported. Skipping for layer V5
[WARNING DRT-0340] LEF58_ENCLOSURE EOL is not supported. Skipping for layer V6

Units:                1000
Number of layers:     21
Number of macros:     229
Number of vias:       93
Number of viarulegen: 11

[INFO DRT-0150] Reading design.

Design:                   BoomTile
Die area:                 ( 0 0 ) ( 1387092 1387092 )
Number of track patterns: 32
Number of DEF vias:       0
Number of components:     9212750
Number of terminals:      388
Number of snets:          2
Number of nets:           1942672

[INFO DRT-0167] List of default vias:
  Layer V2
    default via: VIA23
  Layer V3
    default via: VIA34
  Layer V4
    default via: VIA45
  Layer V5
    default via: VIA56
  Layer V6
    default via: VIA67
  Layer V7
    default via: VIA78
  Layer V8
    default via: VIA89
  Layer V9
    default via: VIA9Pad
[INFO DRT-0162] Library cell analysis.
[INFO DRT-0163] Instance analysis.
  Complete 100000 instances.
  Complete 200000 instances.
  Complete 300000 instances.
  Complete 400000 instances.
  Complete 500000 instances.
  Complete 600000 instances.
  Complete 700000 instances.
  Complete 800000 instances.
  Complete 900000 instances.
  Complete 1000000 instances.
  Complete 2000000 instances.
  Complete 3000000 instances.
  Complete 4000000 instances.
  Complete 5000000 instances.
  Complete 6000000 instances.
  Complete 7000000 instances.
  Complete 8000000 instances.
  Complete 9000000 instances.
[INFO DRT-0164] Number of unique instances = 483.
[INFO DRT-0168] Init region query.
[INFO DRT-0018]   Complete 100000 insts.
[INFO DRT-0018]   Complete 200000 insts.
[INFO DRT-0018]   Complete 300000 insts.
[INFO DRT-0018]   Complete 400000 insts.
[INFO DRT-0018]   Complete 500000 insts.
[INFO DRT-0018]   Complete 600000 insts.
[INFO DRT-0018]   Complete 700000 insts.
[INFO DRT-0018]   Complete 800000 insts.
[INFO DRT-0018]   Complete 900000 insts.
[INFO DRT-0019]   Complete 1000000 insts.
[INFO DRT-0019]   Complete 2000000 insts.
[INFO DRT-0019]   Complete 3000000 insts.
[INFO DRT-0019]   Complete 4000000 insts.
[INFO DRT-0019]   Complete 5000000 insts.
[INFO DRT-0019]   Complete 6000000 insts.
[INFO DRT-0019]   Complete 7000000 insts.
[INFO DRT-0019]   Complete 8000000 insts.
[INFO DRT-0019]   Complete 9000000 insts.
[INFO DRT-0024]   Complete Active.
[INFO DRT-0024]   Complete V0.
[INFO DRT-0024]   Complete M1.
[INFO DRT-0024]   Complete V1.
[INFO DRT-0024]   Complete M2.
[INFO DRT-0024]   Complete V2.
[INFO DRT-0024]   Complete M3.
[INFO DRT-0024]   Complete V3.
[INFO DRT-0024]   Complete M4.
[INFO DRT-0024]   Complete V4.
[INFO DRT-0024]   Complete M5.
[INFO DRT-0024]   Complete V5.
[INFO DRT-0024]   Complete M6.
[INFO DRT-0024]   Complete V6.
[INFO DRT-0024]   Complete M7.
[INFO DRT-0024]   Complete V7.
[INFO DRT-0024]   Complete M8.
[INFO DRT-0024]   Complete V8.
[INFO DRT-0024]   Complete M9.
[INFO DRT-0024]   Complete V9.
[INFO DRT-0024]   Complete Pad.
[INFO DRT-0033] Active shape region query size = 0.
[INFO DRT-0033] V0 shape region query size = 0.
[INFO DRT-0033] M1 shape region query size = 71179657.
[INFO DRT-0033] V1 shape region query size = 181268871.
[INFO DRT-0033] M2 shape region query size = 10138396.
[INFO DRT-0033] V2 shape region query size = 9205197.
[INFO DRT-0033] M3 shape region query size = 18410466.
[INFO DRT-0033] V3 shape region query size = 6136798.
[INFO DRT-0033] M4 shape region query size = 15356680.
[INFO DRT-0033] V4 shape region query size = 6137088.
[INFO DRT-0033] M5 shape region query size = 6536384.
[INFO DRT-0033] V5 shape region query size = 788800.
[INFO DRT-0033] M6 shape region query size = 398622.
[INFO DRT-0033] V6 shape region query size = 0.
[INFO DRT-0033] M7 shape region query size = 0.
[INFO DRT-0033] V7 shape region query size = 0.
[INFO DRT-0033] M8 shape region query size = 0.
[INFO DRT-0033] V8 shape region query size = 0.
[INFO DRT-0033] M9 shape region query size = 0.
[INFO DRT-0033] V9 shape region query size = 0.
[INFO DRT-0033] Pad shape region query size = 0.
[INFO DRT-0165] Start pin access.
[INFO DRT-0076]   Complete 1000 pins.
[INFO DRT-0076]   Complete 2000 pins.
[INFO DRT-0076]   Complete 3000 pins.
[INFO DRT-0076]   Complete 4000 pins.
[INFO DRT-0076]   Complete 5000 pins.
[INFO DRT-0076]   Complete 6000 pins.
[INFO DRT-0076]   Complete 7000 pins.
[INFO DRT-0076]   Complete 8000 pins.
[INFO DRT-0076]   Complete 9000 pins.
[INFO DRT-0077]   Complete 10000 pins.
[INFO DRT-0078]   Complete 11353 pins.
[INFO DRT-0079]   Complete 100 unique inst patterns.
[INFO DRT-0079]   Complete 200 unique inst patterns.
[INFO DRT-0079]   Complete 300 unique inst patterns.
[INFO DRT-0081]   Complete 375 unique inst patterns.
[INFO DRT-0082]   Complete 10000 groups.
[INFO DRT-0082]   Complete 20000 groups.
[INFO DRT-0082]   Complete 30000 groups.
[INFO DRT-0082]   Complete 40000 groups.
[INFO DRT-0082]   Complete 50000 groups.
[INFO DRT-0082]   Complete 60000 groups.
[INFO DRT-0082]   Complete 70000 groups.
[INFO DRT-0082]   Complete 80000 groups.
[INFO DRT-0082]   Complete 90000 groups.
[INFO DRT-0083]   Complete 100000 groups.
[INFO DRT-0083]   Complete 200000 groups.
[INFO DRT-0083]   Complete 300000 groups.
[INFO DRT-0083]   Complete 400000 groups.
[INFO DRT-0083]   Complete 500000 groups.
[INFO DRT-0083]   Complete 600000 groups.
[INFO DRT-0083]   Complete 700000 groups.
[INFO DRT-0083]   Complete 800000 groups.
[INFO DRT-0083]   Complete 900000 groups.
[INFO DRT-0083]   Complete 1000000 groups.
[INFO DRT-0083]   Complete 1100000 groups.
[INFO DRT-0083]   Complete 1200000 groups.
[INFO DRT-0083]   Complete 1300000 groups.
[INFO DRT-0083]   Complete 1400000 groups.
[INFO DRT-0083]   Complete 1500000 groups.
[INFO DRT-0083]   Complete 1600000 groups.
[INFO DRT-0083]   Complete 1700000 groups.
[INFO DRT-0083]   Complete 1800000 groups.
[INFO DRT-0083]   Complete 1900000 groups.
[INFO DRT-0084]   Complete 1911881 groups.
#scanned instances     = 9212750
#unique  instances     = 443
#stdCellGenAp          = 14348
#stdCellValidPlanarAp  = 120
#stdCellValidViaAp     = 11890
#stdCellPinNoAp        = 0
#stdCellPinCnt         = 6875679
#instTermValidViaApCnt = 0
#macroGenAp            = 33839
#macroValidPlanarAp    = 33839
#macroValidViaAp       = 0
#macroNoAp             = 0
[INFO DRT-0166] Complete pin access.
[INFO DRT-0267] cpu time = 00:01:55, elapsed time = 00:00:17, memory = 30707.38 (MB), peak = 35993.86 (MB)
[INFO DRT-0156] guideIn read 100000 guides.
[INFO DRT-0156] guideIn read 200000 guides.
[INFO DRT-0156] guideIn read 300000 guides.
[INFO DRT-0156] guideIn read 400000 guides.
[INFO DRT-0156] guideIn read 500000 guides.
[INFO DRT-0156] guideIn read 600000 guides.
[INFO DRT-0156] guideIn read 700000 guides.
[INFO DRT-0156] guideIn read 800000 guides.
[INFO DRT-0156] guideIn read 900000 guides.
[INFO DRT-0157] guideIn read 1000000 guides.
[INFO DRT-0157] guideIn read 2000000 guides.
[INFO DRT-0157] guideIn read 3000000 guides.
[INFO DRT-0157] guideIn read 4000000 guides.
[INFO DRT-0157] guideIn read 5000000 guides.
[INFO DRT-0157] guideIn read 6000000 guides.
[INFO DRT-0157] guideIn read 7000000 guides.
[INFO DRT-0157] guideIn read 8000000 guides.
[INFO DRT-0157] guideIn read 9000000 guides.
[INFO DRT-0157] guideIn read 10000000 guides.
[INFO DRT-0157] guideIn read 11000000 guides.
[INFO DRT-0157] guideIn read 12000000 guides.
[INFO DRT-0157] guideIn read 13000000 guides.
[INFO DRT-0157] guideIn read 14000000 guides.
[INFO DRT-0157] guideIn read 15000000 guides.
[INFO DRT-0157] guideIn read 16000000 guides.
[INFO DRT-0157] guideIn read 17000000 guides.
[INFO DRT-0157] guideIn read 18000000 guides.
[INFO DRT-0157] guideIn read 19000000 guides.
[INFO DRT-0157] guideIn read 20000000 guides.
[INFO DRT-0157] guideIn read 21000000 guides.
[INFO DRT-0157] guideIn read 22000000 guides.
[INFO DRT-0157] guideIn read 23000000 guides.

Number of guides:     23998525

[INFO DRT-0169] Post process guides.
[INFO DRT-0176] GCELLGRID X 0 DO 2568 STEP 540 ;
[INFO DRT-0177] GCELLGRID Y 0 DO 2568 STEP 540 ;
[INFO DRT-0026]   Complete 100000 origin guides.
[INFO DRT-0026]   Complete 200000 origin guides.
[INFO DRT-0026]   Complete 300000 origin guides.
[INFO DRT-0026]   Complete 400000 origin guides.
[INFO DRT-0026]   Complete 500000 origin guides.
[INFO DRT-0026]   Complete 600000 origin guides.
[INFO DRT-0026]   Complete 700000 origin guides.
[INFO DRT-0026]   Complete 800000 origin guides.
[INFO DRT-0026]   Complete 900000 origin guides.
[INFO DRT-0027]   Complete 1000000 origin guides.
[INFO DRT-0027]   Complete 2000000 origin guides.
[INFO DRT-0027]   Complete 3000000 origin guides.
[INFO DRT-0027]   Complete 4000000 origin guides.
[INFO DRT-0027]   Complete 5000000 origin guides.
[INFO DRT-0027]   Complete 6000000 origin guides.
[INFO DRT-0027]   Complete 7000000 origin guides.
[INFO DRT-0027]   Complete 8000000 origin guides.
[INFO DRT-0027]   Complete 9000000 origin guides.
[INFO DRT-0027]   Complete 10000000 origin guides.
[INFO DRT-0027]   Complete 11000000 origin guides.
[INFO DRT-0027]   Complete 12000000 origin guides.
[INFO DRT-0027]   Complete 13000000 origin guides.
[INFO DRT-0027]   Complete 14000000 origin guides.
[INFO DRT-0027]   Complete 15000000 origin guides.
[INFO DRT-0027]   Complete 16000000 origin guides.
[INFO DRT-0027]   Complete 17000000 origin guides.
[INFO DRT-0027]   Complete 18000000 origin guides.
[INFO DRT-0027]   Complete 19000000 origin guides.
[INFO DRT-0027]   Complete 20000000 origin guides.
[INFO DRT-0027]   Complete 21000000 origin guides.
[INFO DRT-0027]   Complete 22000000 origin guides.
[INFO DRT-0027]   Complete 23000000 origin guides.
[INFO DRT-0028]   Complete Active.
[INFO DRT-0028]   Complete V0.
[INFO DRT-0028]   Complete M1.
[INFO DRT-0028]   Complete V1.
[INFO DRT-0028]   Complete M2.
[INFO DRT-0028]   Complete V2.
[INFO DRT-0028]   Complete M3.
[INFO DRT-0028]   Complete V3.
[INFO DRT-0028]   Complete M4.
[INFO DRT-0028]   Complete V4.
[INFO DRT-0028]   Complete M5.
[INFO DRT-0028]   Complete V5.
[INFO DRT-0028]   Complete M6.
[INFO DRT-0028]   Complete V6.
[INFO DRT-0028]   Complete M7.
[INFO DRT-0028]   Complete V7.
[INFO DRT-0028]   Complete M8.
[INFO DRT-0028]   Complete V8.
[INFO DRT-0028]   Complete M9.
[INFO DRT-0028]   Complete V9.
[INFO DRT-0028]   Complete Pad.
  complete 100000 nets.
  complete 200000 nets.
  complete 300000 nets.
  complete 400000 nets.
  complete 500000 nets.
  complete 600000 nets.
  complete 700000 nets.
  complete 800000 nets.
  complete 900000 nets.
  complete 1000000 nets.
[INFO DRT-0178] Init guide query.
[INFO DRT-0029]   Complete 100000 nets (guide).
[INFO DRT-0029]   Complete 200000 nets (guide).
[INFO DRT-0029]   Complete 300000 nets (guide).
[INFO DRT-0029]   Complete 400000 nets (guide).
[INFO DRT-0029]   Complete 500000 nets (guide).
[INFO DRT-0029]   Complete 600000 nets (guide).
[INFO DRT-0029]   Complete 700000 nets (guide).
[INFO DRT-0029]   Complete 800000 nets (guide).
[INFO DRT-0029]   Complete 900000 nets (guide).
[INFO DRT-0030]   Complete 1000000 nets (guide).
[INFO DRT-0035]   Complete Active (guide).
[INFO DRT-0035]   Complete V0 (guide).
[INFO DRT-0035]   Complete M1 (guide).
[INFO DRT-0035]   Complete V1 (guide).
[INFO DRT-0035]   Complete M2 (guide).
[INFO DRT-0035]   Complete V2 (guide).
[INFO DRT-0035]   Complete M3 (guide).
[INFO DRT-0035]   Complete V3 (guide).
[INFO DRT-0035]   Complete M4 (guide).
[INFO DRT-0035]   Complete V4 (guide).
[INFO DRT-0035]   Complete M5 (guide).
[INFO DRT-0035]   Complete V5 (guide).
[INFO DRT-0035]   Complete M6 (guide).
[INFO DRT-0035]   Complete V6 (guide).
[INFO DRT-0035]   Complete M7 (guide).
[INFO DRT-0035]   Complete V7 (guide).
[INFO DRT-0035]   Complete M8 (guide).
[INFO DRT-0035]   Complete V8 (guide).
[INFO DRT-0035]   Complete M9 (guide).
[INFO DRT-0035]   Complete V9 (guide).
[INFO DRT-0035]   Complete Pad (guide).
[INFO DRT-0036] Active guide region query size = 0.
[INFO DRT-0036] V0 guide region query size = 0.
[INFO DRT-0036] M1 guide region query size = 6358434.
[INFO DRT-0036] V1 guide region query size = 0.
[INFO DRT-0036] M2 guide region query size = 6406898.
[INFO DRT-0036] V2 guide region query size = 0.
[INFO DRT-0036] M3 guide region query size = 4298172.
[INFO DRT-0036] V3 guide region query size = 0.
[INFO DRT-0036] M4 guide region query size = 1094108.
[INFO DRT-0036] V4 guide region query size = 0.
[INFO DRT-0036] M5 guide region query size = 507259.
[INFO DRT-0036] V5 guide region query size = 0.
[INFO DRT-0036] M6 guide region query size = 208666.
[INFO DRT-0036] V6 guide region query size = 0.
[INFO DRT-0036] M7 guide region query size = 69899.
[INFO DRT-0036] V7 guide region query size = 0.
[INFO DRT-0036] M8 guide region query size = 0.
[INFO DRT-0036] V8 guide region query size = 0.
[INFO DRT-0036] M9 guide region query size = 0.
[INFO DRT-0036] V9 guide region query size = 0.
[INFO DRT-0036] Pad guide region query size = 0.
[INFO DRT-0179] Init gr pin query.
[INFO DRT-0185] Post process initialize RPin region query.
[INFO DRT-0181] Start track assignment.
[INFO DRT-0184] Done with 11233764 vertical wires in 52 frboxes and 7709672 horizontal wires in 52 frboxes.
[INFO DRT-0186] Done with 1231459 vertical wires in 52 frboxes and 1528070 horizontal wires in 52 frboxes.
[INFO DRT-0182] Complete track assignment.
[INFO DRT-0267] cpu time = 00:40:49, elapsed time = 00:07:03, memory = 51627.06 (MB), peak = 52074.85 (MB)
[INFO DRT-0187] Start routing data preparation.
Command terminated by signal 9
Elapsed time: 10:13.10[h:]min:sec. CPU time: user 2698.31 sys 37.74 (446%). Peak memory: 58177308KB.
Target //:BoomTile_final failed to build
ERROR: /home/davidbyrd/Documents/src/github.com/The-OpenROAD-Project/megaboom/BUILD.bazel:832:15 Executing genrule //:BoomTile_final failed: (Exit 2): bash failed: error executing CPUHeavyGenrule command (from target //:BoomTile_route) 
  (cd /home/davidbyrd/.cache/bazel/_bazel_davidbyrd/6129d0cbed3c216c13ad15fa0aba969d/execroot/_main && \
  exec env - \
  /bin/bash -c '\
OR_IMAGE=openroad/flow-ubuntu22.04-builder:latest  DESIGN_CONFIG=bazel-out/k8-fastbuild/bin/BoomTile_config.mk  STAGE_CONFIG=bazel-out/k8-fastbuild/bin/BoomTile_route_config.mk  MAKE_PATTERN=external/bazel-orfs~override/route-bazel.mk  RULEDIR=bazel-out/k8-fastbuild/bin  bazel-out/k8-fastbuild/bin/external/bazel-orfs~override/docker_shell  make --silent bazel-route elapsed')
# Configuration: 6220210384caac83f2686a71e0317d4d9a62c6a78f5b323d7478d3ae7e7a8bf9
# Execution platform: @@local_config_platform//:host
INFO: Elapsed time: 685.371s, Critical Path: 683.85s
INFO: 2 processes: 2 internal.
ERROR: Build did NOT complete successfully
oharboe commented 3 days ago

Try enabling virtual memory. As I recall, reducing number of cores wont help much.

maliberty commented 3 days ago

This is a large design:

Number of components:     9212750
Number of nets:           1942672

though it is a bit odd that you have so many more components. Does this design have a lot of dead space? I doubt your problem relates to constraints.

thebyrd commented 2 days ago

I added 30GB of swap and the same thing happened, it just took longer. I'm also running it on an ec2 instance with 124GB of Ram and that's still running. What were the specs of the machine you confirmed it worked on?

Elapsed time: 3:45:22[h:]min:sec. CPU time: user 230047.12 sys 2971.37 (1723%). Peak memory: 63405600KB.
oharboe commented 2 days ago

What you want to know, how much vm is required, should be possible to measure with https://github.com/The-OpenROAD-Project/OpenROAD/pull/5260

oharboe commented 2 days ago

I checked: I have 256GB vm, 64gb ram and 48 cores(96 threads).

oharboe commented 1 day ago

@thebyrd FYI https://github.com/The-OpenROAD-Project/megaboom/pull/66. Running a build now to test...