The-OpenROAD-Project / OpenSTA

OpenSTA engine
GNU General Public License v3.0
386 stars 167 forks source link

Thread-safe `Network` #219

Closed kbieganski closed 3 months ago

kbieganski commented 5 months ago

OpenSTA currently is not thread-safe. If you run the OpenROAD flow in multi-threaded mode (pass OR_ARGS='-threads $(nproc)' to make) you can easily get a data race, often resulting in an infinite loop or segfault, e.g.:

Signal 11 received
Stack trace:
 0# 0x00005574A338B47F in /home/krzysztof/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 1# 0x00007F80C145C710 in /usr/lib/libc.so.6
 2# std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) in /usr/lib/libstdc++.so.6
 3# sta::Network::drivers(sta::Net const*) in /home/krzysztof/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 4# sta::GraphDelayCalc::findMultiDrvrNet(sta::Vertex*) in /home/krzysztof/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 5# sta::GraphDelayCalc::findDriverDelays(sta::Vertex*, sta::ArcDelayCalc*) in /home/krzysztof/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 6# sta::GraphDelayCalc::findVertexDelay(sta::Vertex*, sta::ArcDelayCalc*, bool) in /home/krzysztof/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 7# 0x00005574A36F3B46 in /home/krzysztof/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 8# sta::DispatchQueue::dispatch_thread_handler(unsigned long) in /home/krzysztof/OpenROAD-flow-scripts/tools/install/OpenROAD/bin/openroad
 9# 0x00007F80C16E1943 in /usr/lib/libstdc++.so.6
10# 0x00007F80C14AA9EB in /usr/lib/libc.so.6
11# 0x00007F80C152E7CC in /usr/lib/libc.so.6

This is from the 5_1_grt target, not sure if it happens in other stages.

This stack trace starts in the dispatcher, but the real story is this:

  1. GraphDelayCalc::findDelays(...) calls BfsIterator::visitParallel:
void
GraphDelayCalc::findDelays(Level level)
{
...
    FindVertexDelays visitor(this);
    dcalc_count += iter_->visitParallel(level, &visitor);
...

As the name implies, BfsIterator::visitParallel is multi-threaded. It calls (in this case) FindVertexDelays::visit(...) from multiple threads.

  1. FindVertexDelays::visit(...) calls GraphDelayCalc::findVertexDelay(...):
void
FindVertexDelays::visit(Vertex *vertex)
{
  graph_delay_calc1_->findVertexDelay(vertex, arc_delay_calc_, true);
}
  1. GraphDelayCalc::findVertexDelay(...) calls GraphDelayCalc::findDriverDelays(...):
void
GraphDelayCalc::findVertexDelay(Vertex *vertex,
                                ArcDelayCalc *arc_delay_calc,
                                bool propagate)
{
...
    bool delay_changed = findDriverDelays(vertex, arc_delay_calc);
...
  1. GraphDelayCalc::findDriverDelays(...) calls GraphDelayCalc::findMultiDrvrNet(...):
bool
GraphDelayCalc::findDriverDelays(Vertex *drvr_vertex,
                                 ArcDelayCalc *arc_delay_calc)
{
  bool delay_changed = false;
  MultiDrvrNet *multi_drvr = findMultiDrvrNet(drvr_vertex);
...
  1. GraphDelayCalc::findMultiDrvrNet(...) calls Network::drivers(const Pin*):
MultiDrvrNet *
GraphDelayCalc::findMultiDrvrNet(Vertex *drvr_vertex)
...
    const PinSet *drvrs = network_->drivers(drvr_vertex->pin());
...
  1. Network::drivers(const Pin*) calls Network::drivers(const Net*):
PinSet *
Network::drivers(const Pin *pin)
{
...
    return drivers(net);
...
  1. Network::drivers(const Net*) inserts into what's basically an std::map.
PinSet *
Network::drivers(const Net *net)
{
...
    net_drvr_pin_map_[net] = drvrs;
...

This insertion is not thread-safe. This PR gets rid of this map in Network, as it seems to only be used there for caching. In ConcreteNetwork, it adds synchronization to code that accesses this map in a multi-threaded context.

Below is a performance regression check on 5_1_grt.

Ibex

Command Mean [s] Min [s] Max [s] Relative
ibex / 5_1_grt / master 125.214 ± 0.776 123.797 126.203 1.02 ± 0.01
ibex / 5_1_grt / thread-safe-network 122.379 ± 0.670 121.476 123.630 1.00
Command Mean [s] Min [s] Max [s] Relative
ibex / 5_1_grt / master 124.140 ± 3.096 118.442 127.865 1.01 ± 0.03
ibex / 5_1_grt / thread-safe-network 122.610 ± 0.516 121.760 123.394 1.00

BlackParrot

Command Mean [s] Min [s] Max [s] Relative
black_parrot / 5_1_grt / master 834.350 ± 7.627 817.349 842.018 1.01 ± 0.01
black_parrot / 5_1_grt / thread-safe-network 828.409 ± 6.578 816.466 837.040 1.00
tspyrou commented 5 months ago

This is no longer needed. Cherry pushed a fix which is directed at the root cause. This fix will show up when we next pull from the upstream.

maliberty commented 3 months ago

Fixed upstream