The-OpenROAD-Project / OpenROAD

OpenROAD's unified application implementing an RTL-to-GDS Flow. Documentation at https://openroad.readthedocs.io/en/latest/
https://theopenroadproject.org/
BSD 3-Clause "New" or "Revised" License
1.5k stars 525 forks source link

CTS step is taking unusually long time - 2550 secondns #4292

Open sandeep-amd opened 9 months ago

sandeep-amd commented 9 months ago

Describe the bug

I have 2 similar system . On one system CTS is taking around 625 seconds while on the other system it is taking 2550 seconds. These results are consistent.

On both system cpufreq governor is set to performance.

System 1 log: (625 seconds)

Repair setup and hold violations... TNS end percent 100 Skipping gate cloning during optimization [INFO RSZ-0094] Found 3457 endpoints with setup violations. [INFO RSZ-0045] Inserted 200 buffers, 1 to split loads. [INFO RSZ-0041] Resized 195 instances. [INFO RSZ-0043] Swapped pins on 80 instances. [WARNING RSZ-0062] Unable to repair all setup violations. [INFO RSZ-0033] No hold violations found. Placement Analysis

total displacement 985.8 u average displacement 0.0 u max displacement 11.0 u original HPWL 5532893.1 u legalized HPWL 5533517.5 u delta HPWL 0 %

System 2 (2550 seconds):

Repair setup and hold violations... TNS end percent 100 Skipping gate cloning during optimization [INFO RSZ-0094] Found 3459 endpoints with setup violations. [INFO RSZ-0045] Inserted 539 buffers, 1 to split loads. [INFO RSZ-0041] Resized 2955 instances. [INFO RSZ-0043] Swapped pins on 118 instances. [WARNING RSZ-0062] Unable to repair all setup violations. [INFO RSZ-0033] No hold violations found. Placement Analysis

total displacement 6384.6 u average displacement 0.0 u max displacement 19.6 u original HPWL 5567471.6 u legalized HPWL 5572355.7 u delta HPWL 0 %

The difference i see here with respect to total displacement.

Please do let me know how to fix this.

Expected Behavior

CTS should take around 625 seconds in both systems. As that is nearer to golden value.

Environment

kernel: Linux 5.15.0-86-generic
os: Ubuntu 22.04.3 LTS (Jammy Jellyfish)
cmake version 3.24.2
-- The CXX compiler identification is Clang 16.0.3
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/amd/compiler/aocc4.1/aocc-compiler-rel-4.1.0-4451-270/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- OpenROAD version: v2.0-10770-g6719c92e6
-- System name: Linux
-- Compiler: Clang 16.0.3
-- Build type: RELEASE
-- Install prefix: /usr/local
-- C++ Standard: 17
-- C++ Standard Required: ON
-- C++ Extensions: OFF
-- The C compiler identification is Clang 16.0.3
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /home/amd/compiler/aocc4.1/aocc-compiler-rel-4.1.0-4451-270/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found Python: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test C_COMPILER_SUPPORTS__-O3
-- Performing Test C_COMPILER_SUPPORTS__-O3 - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-O3
-- Performing Test CXX_COMPILER_SUPPORTS__-O3 - Success
-- Performing Test C_COMPILER_SUPPORTS__-march=native
-- Performing Test C_COMPILER_SUPPORTS__-march=native - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-march=native
-- Performing Test CXX_COMPILER_SUPPORTS__-march=native - Success
-- Performing Test C_COMPILER_SUPPORTS__-flto
-- Performing Test C_COMPILER_SUPPORTS__-flto - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-flto
-- Performing Test CXX_COMPILER_SUPPORTS__-flto - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wall
-- Performing Test C_COMPILER_SUPPORTS__-Wall - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wall
-- Performing Test CXX_COMPILER_SUPPORTS__-Wall - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-array-bounds
-- Performing Test C_COMPILER_SUPPORTS__-Wno-array-bounds - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-array-bounds
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-array-bounds - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-nonnull
-- Performing Test C_COMPILER_SUPPORTS__-Wno-nonnull - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-nonnull
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-nonnull - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-maybe-uninitialized
-- Performing Test C_COMPILER_SUPPORTS__-Wno-maybe-uninitialized - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-maybe-uninitialized
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-maybe-uninitialized - Failed
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format-overflow
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format-overflow - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format-overflow
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format-overflow - Failed
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-variable
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-variable - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-variable
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-variable - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-function
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-function - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-function
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-function - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-write-strings
-- Performing Test C_COMPILER_SUPPORTS__-Wno-write-strings - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-write-strings
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-write-strings - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-sign-compare
-- Performing Test C_COMPILER_SUPPORTS__-Wno-sign-compare - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-sign-compare
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-sign-compare - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-deprecated
-- Performing Test C_COMPILER_SUPPORTS__-Wno-deprecated - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-deprecated
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-deprecated - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-c++11-narrowing
-- Performing Test C_COMPILER_SUPPORTS__-Wno-c++11-narrowing - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-c++11-narrowing
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-c++11-narrowing - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-register
-- Performing Test C_COMPILER_SUPPORTS__-Wno-register - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-register
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-register - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format
-- Performing Test C_COMPILER_SUPPORTS__-Wno-format - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-format - Success
-- Performing Test C_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal
-- Performing Test C_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-reserved-user-defined-literal - Success
-- Performing Test C_COMPILER_SUPPORTS__-fpermissive
-- Performing Test C_COMPILER_SUPPORTS__-fpermissive - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-fpermissive
-- Performing Test CXX_COMPILER_SUPPORTS__-fpermissive - Success
-- Performing Test C_COMPILER_SUPPORTS__-x
-- Performing Test C_COMPILER_SUPPORTS__-x - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__-x
-- Performing Test CXX_COMPILER_SUPPORTS__-x - Failed
-- Performing Test C_COMPILER_SUPPORTS__c++
-- Performing Test C_COMPILER_SUPPORTS__c++ - Failed
-- Performing Test CXX_COMPILER_SUPPORTS__c++
-- Performing Test CXX_COMPILER_SUPPORTS__c++ - Failed
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-but-set-variable
-- Performing Test C_COMPILER_SUPPORTS__-Wno-unused-but-set-variable - Success
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-but-set-variable
-- Performing Test CXX_COMPILER_SUPPORTS__-Wno-unused-but-set-variable - Success
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- TCL readline library: /usr/lib/x86_64-linux-gnu/libtclreadline.so
-- TCL readline header: /usr/include/x86_64-linux-gnu
-- Found SWIG: /usr/bin/swig4.0 (found suitable version "4.0.2", minimum required is "3.0")
-- Found Boost: /usr/local/lib/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")
-- boost: 1.83.0
-- Found Python3: /usr/include/python3.10 (found version "3.10.12") found components: Development Development.Module Development.Embed
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11")
-- spdlog: 1.9.2
-- Found BISON: /usr/bin/bison (found version "3.8.2")
-- Found Doxygen: /usr/bin/doxygen (found version "1.9.1") found components: doxygen dot
-- STA version: 2.4.0
-- STA git sha: 1fed0491bb9e84c9622b7e984821e0711502a14e
-- System name: Linux
-- Compiler: Clang 16.0.3
-- Build type: RELEASE
-- Build CXX_FLAGS: -O3 -DNDEBUG
-- Install prefix: /usr/local
-- Found FLEX: /usr/bin/flex (found version "2.6.4")
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- SSTA: 0
-- STA executable: /home/amd/sandeep/benchmarking/ORFS/AOCC/OpenROAD-flow-scripts/tools/OpenROAD/src/sta/app/sta
-- Found re2: /opt/or-tools/lib/cmake/re2/re2Config.cmake (found version "9.0.0")
-- Found Clp: /opt/or-tools/lib/cmake/Clp/ClpConfig.cmake (found version "1.17.7")
-- Found Cbc: /opt/or-tools/lib/cmake/Cbc/CbcConfig.cmake (found version "2.10.7")
-- Found SCIP: /opt/or-tools/lib/cmake/scip/scip-config.cmake (found version "8.0.1")
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0")
-- GPU is not enabled
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- Found Eigen3: /usr/local/share/eigen3/cmake/Eigen3Config.cmake (found version "3.4.0")
-- GUI is enabled
-- Charts widget is enabled
-- Found Boost: /usr/local/lib/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0") found components: serialization
-- Could NOT find VTune (missing: VTune_LIBRARIES VTune_INCLUDE_DIRS)
-- Found Boost: /usr/local/lib/cmake/Boost-1.83.0/BoostConfig.cmake (found suitable version "1.83.0", minimum required is "1.78")
-- TCL library: /usr/lib/x86_64-linux-gnu/libtcl.so
-- TCL header: /usr/include/tcl/tcl.h
-- Found Boost: /usr/local/lib/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0") found components: serialization system thread
-- TCL readline enabled
-- Tcl Extended disabled
-- Python3 enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/tmp.8IHetRMv7r

To Reproduce

This issue is specific to one of the server i am working on.

Relevant log output

No response

Screenshots

No response

Additional Context

No response

QuantamHD commented 9 months ago

Are the binaries on both systems byte for byte identical? Also do you have a test case that you can share with us?

maliberty commented 9 months ago

What are the two systems? You are getting different answers so something is varying beyond runtime.

sandeep-amd commented 9 months ago

The binaries are not byte to byte identical. But both are built with same compiler(AOCC). Binary size is almost similar i.e OpenRoad is 79 MB on both systems. Both are AMD servers with similar configuration .

System1:~/sandeep/benchmarking/ORFS/AOCC/OpenROAD-flow-scripts/tools/install/OpenROAD$ ls -lrt bin/openroad -rwxr-xr-x 1 amd amd 82015776 Oct 26 17:22 bin/openroad

System 2:~/dporwal/openroad/ORFS/new/OpenROAD-flow-scripts/tools/install/OpenROAD$ ls -lrt bin/openroad -rwxr-xr-x 1 amd amd 82252592 Nov 15 11:08 bin/openroad

maliberty commented 9 months ago

Why not use the same binary to eliminate a source of difference? Also is the input data identical?

tspyrou commented 9 months ago

@sandeep-amd if you send an email to info@precisioninno.com, we can get connected and figure out how to solve this. -Tom

sandeep-amd commented 9 months ago

Yes the input data is identical as it is being run using Open Road Flow Scripts.

Also, if yoy see the log, major difference seems to be in this step. [INFO RSZ-0045] Inserted 200 buffers, 1 to split loads. [INFO RSZ-0041] Resized 195 instances. [INFO RSZ-0043] Swapped pins on 80 instances.

################################################################# [INFO RSZ-0045] Inserted 539 buffers, 1 to split loads. [INFO RSZ-0041] Resized 2955 instances. [INFO RSZ-0043] Swapped pins on 118 instances. #################################################################

Why it has inserted more buffers(200 vs 539) and resized more instances (195 vs 2955) ? What does these steps do ?

sandeep-amd commented 9 months ago

@sandeep-amd if you send an email to info@precisioninno.com, we can get connected and figure out how to solve this. -Tom

i will send an email

maliberty commented 9 months ago

It is performing timing optimization. You need to use the same build of OR to have comparable results. The simplest way to ensure that would be to use the same binaries on both machines.

maliberty commented 6 months ago

Were you able to connect with @tspyrou ?

tspyrou commented 6 months ago

@sandeep-amd I don't think that we connected yet.