OpenROAD GUI crashes when opening ASAP7 design

oharboe commented 1 year ago

Describe the bug

SIGSEGV during load gui_cts

Signal 11 received
Stack trace:
 0# 0x0000000000CCCFA7 in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 1# 0x00007F82109B9400 in /lib64/libc.so.6
 2# 0x00007F821230C399 in /lib64/libQt5Gui.so.5
 3# 0x00007F821230C72E in /lib64/libQt5Gui.so.5
 4# 0x00007F82123067F5 in /lib64/libQt5Gui.so.5
 5# gui::BrowserWidget::makeRowItems(QStandardItem*, std::string const&, gui::BrowserWidget::ModuleStats const&, QStandardItem*, bool) const in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 6# gui::BrowserWidget::addInstanceItem(odb::dbInst*, QStandardItem*) in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 7# gui::BrowserWidget::addInstanceItems(std::vector<odb::dbInst*, std::allocator<odb::dbInst*> > const&, std::string const&, QStandardItem*) in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 8# gui::BrowserWidget::updateModel() in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 9# 0x00000000013BC7EC in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
10# QMetaObject::activate(QObject*, int, int, void**) in /lib64/libQt5Core.so.5
11# gui::MainWindow::designLoaded(odb::dbBlock*) in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
12# ord::OpenRoad::readDb(char const*) in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
13# 0x0000000000CDCD96 in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
14# 0x00007F82153F8EB2 in /lib64/libtcl8.5.so
15# 0x00007F821543D36C in /lib64/libtcl8.5.so

Expected Behavior

Shouldn't crash.

Environment

Git commit: f6413621cbb4c9db0f4481e32623a7afca6df52f
kernel: Linux 5.15.0-56-generic
os: Ubuntu 22.04.1 LTS (Jammy Jellyfish)
cmake version 3.24.2
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Ubuntu clang version 14.0.0-1ubuntu1

To Reproduce

Uses proprietary data, it is a "big" ASAP7 design, I'm trying to figure out how well OpenROAD scales(scaled up a design to be ca. 1mm^2 ASAP7), so I'll try to run it in the debugger to get more info.

Relevant log output

No response

Screenshots

No response

Additional Context

No response

oharboe commented 1 year ago

Was able to reproduce in the debugger, if I continued I step into the signal handler for the segfault, but it's not obvious what's going on and not a lot of local variables to look at:

maliberty commented 1 year ago

I'm wondering if you are hitting an out of memory condition at this point. There is nothing obviously wrong here.

oharboe commented 1 year ago

I'm wondering if you are hitting an out of memory condition at this point. There is nothing obviously wrong here.

I dont think so, but I will do some more testing. I am hoping to create generic examples that I can contribute.

QuantamHD commented 1 year ago

Try building openroad with ASAN. It might reveal the problem more evidently

oharboe commented 1 year ago

Try building openroad with ASAN. It might reveal the problem more evidently

Trying, don't know how to tell if I set up cmake correctly though.

Meanwhile.... 67*10^6 instances according to the below.

oharboe commented 1 year ago

Ah... looks like I could be running out of memory.... I only have 256gbyte of virtual memory.

Us "top" trying to tell me that openocd has reserved 20 terrabytes of memory???

I suppose the OpenROAD GUI can't call addInstanceItem() 67*10^6 times with that sort of memory usage.

Some printing and observing top.

Ca. 3gbyte per million instances. 3 * 67 = 210gbyte. That's just barely doable on my machine.

oharboe commented 1 year ago

Got some more more info 53.5gbyte RAM. Virtual (according to "top") is 20t. Whatever that means...

oharboe commented 1 year ago

Some more info further up the stack:

oharboe commented 1 year ago

Ah.

The second crash was with "i=44833871" too. This is indicative of something poisonous about this instance and not out of memory.

oharboe commented 1 year ago

Hmmm... don't know how to debug from here.

QuantamHD commented 1 year ago

That allocation size of 0xffff... is very suspicious. It's indicative of signed 64 bit -1 being cast as an unsigned 64bit int.

QuantamHD commented 1 year ago

You also shouldn't use int i in these loops. Rather size_t to avoid unintended wrap around

oharboe commented 1 year ago

You also shouldn't use int i in these loops. Rather size_t to avoid unintended wrap around

Good point. It was bust a quick local hack to diagnose, not in a pull request.

QuantamHD commented 1 year ago

Ah I was just wondering if that int might have been overflowed in your test case. Probably not, but worth ruling out.

oharboe commented 1 year ago

Ah I was just wondering if that int might have been overflowed in your test case. Probably not, but worth ruling out.

I'm thinking the next step here is to create a test-case that can be contributed to open source that reproduces the problem. I need some help to debug this and it's too slow to do interactively in a call

This is also a crash that is significantly beyond where the OpenROAD GUI scales nicely, so scaling OpenROAD GUI is probably a better start than tracking down this particular bug?

maliberty commented 1 year ago

I suspect this is mostly revealing asap7 to have only a single small filler cell in use. There are only two sizes in the LEF which means it will take tons of cells to fill the design (one isn't used for some reason, probably a mistake). Compare to sky130 which has 1,2,4,8 sized fillers. Most proprietary PDKs go up to 64x or 128x sized cells.

maliberty commented 1 year ago

You could never really P&R 67M instances without waiting weeks.

QuantamHD commented 1 year ago

You're just not thinking creatively enough @maliberty.

QuantamHD commented 1 year ago

@oharboe I would love some super sized test cases that we could use to improve performance. I have a whole team at Google who would probably be interested in getting their name on some kind of leader board.

maliberty commented 1 year ago

@QuantamHD you are thinking too creatively. I am trying to solve the problem at hand. I would guess that 99.9% of the instances are fill cells due to the lack of realism of asap7 in this regard. It isn't a real test of capacity - which I would welcome.

maliberty commented 1 year ago

Would you report how many instances exist in initialize_floorplan vs how many fill cells are reported in4_2_cts_fillcell.log

QuantamHD commented 1 year ago

@maliberty agreed. Do you think it would be worth implementing an optimization that merges adjacent fill cells into pseudo larger fill cells?

maliberty commented 1 year ago

@QuantamHD I think someone should just fix asap7 as that won't be a generally useful feature. I'll see if @louiic can take care of it.

oharboe commented 1 year ago

@QuantamHD I think someone should just fix asap7 as that won't be a generally useful feature. I'll see if @louiic can take care of it.

Thanks!

oharboe commented 1 year ago

Some more information, without opening the GUI, I found the following in 6_report.log, meaning that the GUI is launched during the bash flow:

==========================================================================
finish report_design_area
--------------------------------------------------------------------------
Design area 100314 u^2 10% utilization.

[deleted]
Signal 11 received
Stack trace:
 0# 0x0000000000CCCFA7 in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 1# 0x00007F1E53D17400 in /lib64/libc.so.6
 2# 0x00007F1E5566A399 in /lib64/libQt5Gui.so.5
 3# 0x00007F1E5566A72E in /lib64/libQt5Gui.so.5
 4# 0x00007F1E556647F5 in /lib64/libQt5Gui.so.5
 5# gui::BrowserWidget::makeRowItems(QStandardItem*, std::string const&, gui::BrowserWidget::ModuleStats const&, QStandardItem*, bool) const in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 6# gui::BrowserWidget::addInstanceItem(odb::dbInst*, QStandardItem*) in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 7# gui::BrowserWidget::addInstanceItems(std::vector<odb::dbInst*, std::allocator<odb::dbInst*> > const&, std::string const&, QStandardItem*) in /OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad

This probably comes from this step in save_images.tcl.

# Save a final image if openroad is compiled with the gui
if {[expr [llength [info procs save_image]] > 0]} {
    gui::show "source $::env(SCRIPTS_DIR)/save_images.tcl" false
}

The good news is that it should be possible to write an automatic test to reproduce the problem.

In my case, detailed routing failed with 100 violations after 64 iterations. I think 64 iterations is max.

Hmm... it is a bit surprising that the flow completed, even if detailed routing had 100 violations.

maliberty commented 1 year ago

Yes we save images of the layout from the GUI even if it isn't displayed.

64 iterations is the max.

oharboe commented 1 year ago

This is mainly a problem with ASAP7 generating too many filler cells: https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/issues/732

A separate feature request might be filed to make OpenROAD GUI scale better with millions of instances, but I'm not sure that's really the only problem that exists at that scale and until we have somewhat realistic designs with tens of millions of instances, we won't be able to get a good handle on the real issues with designs at that scale.

rovinski commented 1 year ago

I think you meant to cross-link https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/issues/732

oharboe commented 1 year ago

I think you meant to cross-link The-OpenROAD-Project/OpenROAD-flow-scripts#732

Fixed. Thanks!

ShvetankPrakash commented 1 year ago

Would you be able to provide some guidance to share how you setup the debugger? I am trying to debug a segfault I have introduced when playing around with the tool but am having trouble figuring out the best way to debug. Any guidance would be much appreciated @oharboe @maliberty :)

maliberty commented 1 year ago

I keep this alias:

debug is aliased to `cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS_DEBUG="-g -O0" tools/OpenROAD -B tools/OpenROAD/debug -D CMAKE_INSTALL_PREFIX=$(pwd)/tools/install/OpenROAD -D ALLOW_WARNINGS=0'

ShvetankPrakash commented 1 year ago

Thanks for the reply @maliberty!

Apologies, I might have not provided enough context! I have built openroad locally and have a tcl script I want to pass in debug mode.

I tried to use lldb like so:

lldb openroad run.tcl

but this does not work as it says the process fails to execute

Could you explain a bit further how I can use what you provided for this debugging purpose? I'm still learning about the tool, so thanks in advance!

oharboe commented 1 year ago

@ShvetankPrakash Here is something new you can try as of yesterday, if you check out the latest ORFS(OpenROAD-flow-scripts).

I haven't tried it, but it should work :-)

Run make bash. This will put you into a shell where all the environment variables are set up
Now you should be able to run "openroad run.tcl" and it should work
Next: start your favorite IDE with debugger from this bash shell and you have all the environment variables set up and you can setup a debug session with "opernroad run.tcl"

ShvetankPrakash commented 1 year ago

Thank @oharboe for the reply! I think step 3 is what I am having the trouble with (hooking up my VSCode IDE debugger to openroad run.tcl) I see you were able to debug in VSCode it appears- can you share how you set this up? Would be much appreciated!

oharboe commented 1 year ago

I use a different approach, I launch the flow or the GUI as usual and attach to it from Visual Studio Code. That way I don't have to set up any environment variables. On the downside, I have to be quick and sometimes lucky to be able to attach fast enough.

Depending on your OS, you have some setup to do with lldb and permissions to attach to a running process...

This launch.json goes into tools/OpenROAD/.vscode/:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "lldb",
            "request": "attach",
            "name": "Attach",
            "program": "${workspaceFolder}/../tools/OpenROAD/bin/openroad"
        }
    ]
}

maliberty commented 1 year ago

% lldb openroad zerosoc_pads.tcl 
(lldb) target create "openroad"
Current executable set to 'openroad' (x86_64).
(lldb) settings set -- target.run-args  "zerosoc_pads.tcl"
(lldb) r
Process 254520 launched: '/workspaces/mliberty/w8/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad' (x86_64)

It works fine for me. I'm not sure what is different in your setup.

ShvetankPrakash commented 1 year ago

I use a different approach, I launch the flow or the GUI as usual and attach to it from Visual Studio Code. That way I don't have to set up any environment variables. On the downside, I have to be quick and sometimes lucky to be able to attach fast enough.

Depending on your OS, you have some setup to do with lldb and permissions to attach to a running process...

This launch.json goes into tools/OpenROAD/.vscode/:
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "lldb",
            "request": "attach",
            "name": "Attach",
            "program": "${workspaceFolder}/../tools/OpenROAD/bin/openroad"
        }
    ]
}

Did you have to update your GLIBC version to run with the LLDB vs code extension @oharboe ? You just run openroad run.tcl and then try to start the debugger in time before the crash if I am understanding you process correctly?

ShvetankPrakash commented 1 year ago

% lldb openroad zerosoc_pads.tcl 
(lldb) target create "openroad"
Current executable set to 'openroad' (x86_64).
(lldb) settings set -- target.run-args  "zerosoc_pads.tcl"
(lldb) r
Process 254520 launched: '/workspaces/mliberty/w8/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad' (x86_64)

It works fine for me. I'm not sure what is different in your setup.

This is what happens for me when I tried to follow your steps:

$ lldb openroad run.tcl 
Current executable set to 'openroad' (x86_64).
(lldb) target create "openroad"
Current executable set to 'openroad' (x86_64).
(lldb) settings set -- target.run-args "run.tcl"
(lldb) r
error: process launch failed: Child exec failed.

Any idea what might be going on @maliberty?

maliberty commented 1 year ago

No idea. I mostly use gdb fwiw.

oharboe commented 1 year ago

I use a different approach, I launch the flow or the GUI as usual and attach to it from Visual Studio Code. That way I don't have to set up any environment variables. On the downside, I have to be quick and sometimes lucky to be able to attach fast enough. Depending on your OS, you have some setup to do with lldb and permissions to attach to a running process... This launch.json goes into tools/OpenROAD/.vscode/:
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "lldb",
            "request": "attach",
            "name": "Attach",
            "program": "${workspaceFolder}/../tools/OpenROAD/bin/openroad"
        }
    ]
}
Did you have to update your GLIBC version to run with the LLDB vs code extension @oharboe ?

No. I use Ubuntu 22.04 FYI.

You just run openroad run.tcl and then try to start the debugger in time before the crash if I am understanding you process correctly?

Yes. You can even attach a bit early. Works better than I would expect

ShvetankPrakash commented 1 year ago

I use a different approach, I launch the flow or the GUI as usual and attach to it from Visual Studio Code. That way I don't have to set up any environment variables. On the downside, I have to be quick and sometimes lucky to be able to attach fast enough. Depending on your OS, you have some setup to do with lldb and permissions to attach to a running process... This launch.json goes into tools/OpenROAD/.vscode/:
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "type": "lldb",
            "request": "attach",
            "name": "Attach",
            "program": "${workspaceFolder}/../tools/OpenROAD/bin/openroad"
        }
    ]
}
Did you have to update your GLIBC version to run with the LLDB vs code extension @oharboe ?
No. I use Ubuntu 22.04 FYI.

You just run openroad run.tcl and then try to start the debugger in time before the crash if I am understanding you process correctly?

Yes. You can even attach a bit early. Works better than I would expect

I think I was able to get the debugger attachment working (using your launch config in ./tools/OpenROAD/.vscode) but I think I am still not understanding the debugging flow.

Can you provide a simple example of what you would do to debug a step in pdngen or floorplanning for example? I have the ORFS cloned and want to debug the pdn step using your attachment mechanism for example. I tried for example, though (for debugging proof of concept for myself), to run openroad init_floorplan2.tcl from cmd line and placed a breakpoint in initFloorplan.cc. Then I was able to attach the debugger to the running openroad process but do not see the breakpoint reached from here, so I think I am doing forgetting something silly.

This would be a huge help @oharboe if you could shed some light here and much appreciated! Thank you again!

oharboe commented 1 year ago

@ShvetankPrakash At this point, it looks like more of a C++ debugging setup issue... I would recommend reaching out to someone local who can sit down with you and get this set up. Some sort of snag in your exact local setup, it would seem.

ShvetankPrakash commented 1 year ago

@oharboe thanks for all the replies! Ya seems like based on both your and Matt’s replies like it’s just an issue with my local debug setup as I’m doing the same thing as you both 👍

But just wanted to check that what I’m doing at a high level is how others are debugging OpenROAD for a sanity check, so both your and Matt’s replies in this thread have been useful in confirming that it’s just that— so thank you! I’ll dig into this part now further to figure out what the exact issue is 🙂

ShvetankPrakash commented 1 year ago

Forgot to mention I was able to resolve the issue myself with the local set up I have and debugger is now working by the way 🙂 Thanks again for all the comments!

The-OpenROAD-Project / OpenROAD