facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 176 forks source link

1.0.0: cmake fails and some other cmake warnings #123

Open kloczek opened 3 years ago

kloczek commented 3 years ago

Just started packaging BOLT and I've stumped on some cmake issues

+ cd BOLT-1.0.0
+ CFLAGS='-O2 -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fdata-sections -ffunction-sections -flto=auto -flto-partition=none'
+ CXXFLAGS='-O2 -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fdata-sections -ffunction-sections -flto=auto -flto-partition=none'
+ FFLAGS='-O2 -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fdata-sections -ffunction-sections -flto=auto -flto-partition=none -I/usr/lib64/gfortran/modules'
+ FCFLAGS='-O2 -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fdata-sections -ffunction-sections -flto=auto -flto-partition=none -I/usr/lib64/gfortran/modules'
+ LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,--gc-sections -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -flto=auto -flto-partition=none -fuse-linker-plugin'
+ CC=/usr/bin/gcc
+ CXX=/usr/bin/g++
+ FC=/usr/bin/gfortran
+ AR=/usr/bin/gcc-ar
+ NM=/usr/bin/gcc-nm
+ RANLIB=/usr/bin/gcc-ranlib
+ export CFLAGS CXXFLAGS FFLAGS FCFLAGS LDFLAGS CC CXX FC AR NM RANLIB
+ /usr/bin/cmake -B x86_64-redhat-linux-gnu -D BUILD_SHARED_LIBS=ON -D CMAKE_AR=/usr/bin/gcc-ar -D CMAKE_BUILD_TYPE=RelWithDebInfo -D CMAKE_C_FLAGS_RELEASE=-DNDEBUG -D CMAKE_CXX_FLAGS_RELEASE=-DNDEBUG -D CMAKE_Fortran_FLAGS_RELEASE=-DNDEBUG -D CMAKE_INSTALL_PREFIX=/usr -D CMAKE_NM=/usr/bin/gcc-nm -D CMAKE_RANLIB=/usr/bin/gcc-ranlib -D CMAKE_VERBOSE_MAKEFILE=ON -D DBUILD_SHARED_LIBS=ON -D INCLUDE_INSTALL_DIR=/usr/include -D LIB_INSTALL_DIR=/usr/lib64 -D LIB_SUFFIX=64 -D SHARE_INSTALL_PREFIX=/usr/share -D SYSCONF_INSTALL_DIR=/etc -S .
CMake Warning (dev) in CMakeLists.txt:
  No project() command is present.  The top-level CMakeLists.txt file must
  contain a literal, direct call to the project() command.  Add a line of
  code such as

    project(ProjectName)

  near the top of the file, but after cmake_minimum_required().

  CMake is pretending there is a "project(Project)" command on the first
  line.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- The C compiler identification is GNU 11.0.0
-- The CXX compiler identification is GNU 11.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:24 (add_llvm_install_targets):
  Unknown CMake command "add_llvm_install_targets".

CMake Warning (dev) in CMakeLists.txt:
  No cmake_minimum_required command is present.  A line of code such as

    cmake_minimum_required(VERSION 3.19)

  should be added at the top of the file.  The version specified may be lower
  if you wish to support older CMake versions for this project.  For more
  information run "cmake --help-policy CMP0000".
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Configuring incomplete, errors occurred!
See also "/home/tkloczko/rpmbuild/BUILD/BOLT-1.0.0/x86_64-redhat-linux-gnu/CMakeFiles/CMakeOutput.log".
kloczek commented 3 years ago
[tkloczko@barrel x86_64-redhat-linux-gnu]$ grep -r add_llvm_install_targets /usr/lib64/cmake
/usr/lib64/cmake/llvm/AddLLVM.cmake:function(add_llvm_install_targets target)
/usr/lib64/cmake/llvm/AddLLVM.cmake:        add_llvm_install_targets(install-${name}
/usr/lib64/cmake/llvm/AddLLVM.cmake:        add_llvm_install_targets(install-${name}
/usr/lib64/cmake/llvm/AddLLVM.cmake:        add_llvm_install_targets(install-${name}
/usr/lib64/cmake/llvm/AddLLVM.cmake:    add_llvm_install_targets(install-${name}
/usr/lib64/cmake/llvm/AddLLVM.cmake:    add_llvm_install_targets(install-${name}
/usr/lib64/cmake/llvm/AddSphinxTarget.cmake:          add_llvm_install_targets("install-${SPHINX_TARGET_NAME}"
/usr/lib64/cmake/llvm/AddSphinxTarget.cmake:          add_llvm_install_targets("install-${SPHINX_TARGET_NAME}"
/usr/lib64/cmake/llvm/LLVMDistributionSupport.cmake:              " Its installation target creation should be changed to use add_llvm_install_targets,"
/usr/lib64/cmake/llvm/LLVMExternalProjectUtils.cmake:    add_llvm_install_targets(install-${name}
/usr/lib64/cmake/llvm/TableGen.cmake:      add_llvm_install_targets("install-${target}"

So looks like I have that cmake function in my build env ..

rafaelauler commented 3 years ago

Hi Tomasz, did you check out LLVM first and applied our patch on top of LLVM? I'm saying this because of:

No project() command is present. The top-level CMakeLists.txt file must contain a literal, direct call to the project() command.

So it looks to me that you are running cmake directly on our project folder, when you should run it on the top level llvm folder. See https://github.com/facebookincubator/BOLT#installation

kloczek commented 3 years ago

I think that for now we can close that ticket for now. This week in llvm weekly newnsletter I found that process of pushing BOLT to the LLVM stater so I think that I will wait until that process will be finished. Do you know are those BOLT patches will be integrated in llvm 12.x?

rafaelauler commented 3 years ago

Hi Tomasz, we don't have an established timeline that would answer which version of llvm will BOLT be in. We will be refactoring parts of the project, but that is an ongoing discussion with other people interested in the project, and the time largely depends on how this discussion goes. Meanwhile, we will live in https://github.com/facebookincubator/BOLT/tree/rebased (rebased branch), and that is a snapshot of our current status of integration into LLVM. We are in the process of deprecating our current main branch, so we can start developing more heavily in our rebased branch towards integration into LLVM.

kloczek commented 3 years ago

I have plan to try to give chance BOLT on use it on large scalke set of typical distribution packages. I have at the moment almost 3k cleaned packages basing on Fedora rwhide. I that case on top of praparing llvm/clang will be necessary to prepare kind odf integration of the use BOLT in typical rpm packages build processes. My understanding is that necessaty optimisattion process could be possible to integrate in %__spec_install_post before stripping down debug info and other steps to add in that new injected stage BOLT post processing steps of just installed in %buildroot elf binaries.

I still not been able to use BOLT and I plan look on it closer in ext 2-3 weeks. If you have any suggstions about hpow to start plugging BOLT into typical packages build perocessses it wil be helpfull to start thinking about that before I'l start doing that on full time. So far I found that BOLT may consume a lot of mamory duting processing binaries. That part wil be not a problem for me. I have two 2xCPU Xeon physical boxes (one fster with 128GB memory and slower almost the same fast with 384GB RAM).

rafaelauler commented 3 years ago

Regarding memory usage, I don't think you will ever need more than 64GB, and that will be necessary only in huge binaries (not the ones typically seen in opensource distros).

For wide scale usage of BOLT, at the moment I see some challenges: first, collecting profile for these packages. Similar to FDO/PGO technology, you would need to come up with a workload for each package, run the program under this load and collect the profile, so BOLT knows the hot paths exercised in the workload and knows how to optimize its layout in a way that best uses scarce L1 icache resources, as well as iTLB and the branch predictor.

Yet, you may not observe performance differences for small binaries because their working set size is comparable to the size of the processor icache, and the branch predictor is pretty effective in most scenarios (some exceptions do exist). ITLB, Icache and bp are the main drivers of performance wins for bolt-optimized programs. It's then advisable to run BOLT on workloads that suffer with bad performance on these counters, such as the compiler itself (clang or gcc), mysqld, etc., but for programs that do not, there is little optimization room for BOLT.

kloczek commented 3 years ago

For wide scale usage of BOLT, at the moment I see some challenges: first, collecting profile for these packages. Similar to FDO/PGO technology, you would need to come up with a workload for each package, run the program under this load and collect the profile, so BOLT knows the hot paths exercised in the workload and knows how to optimize its layout in a way that best uses scarce L1 icache resources, as well as iTLB and the branch predictor.

I'm assuming that those profiling data will be input data provided as part of the package source resources. In some cases it will be probably possibe adapt some test suites to generate some initial profiling data. Witout larger scale use BOLT IMO it would be not possibe to start twhinking how to collect those resources.

I see for now at least woi classes of those profiling data: 1) those which may be used to speedup initialisation prosess of some applicationas like firefox or gimp. Probably the same can be done for every possible binary or DSO. Those profiling data more or lless can be generated automatically. 2) profiling of the code used after initialisation In worse scenario IMO it would be possibe to start only from first class data. Sooner or later second part would be possible to collect as well. Question is do you see any possibility to combine more tah one source of profiling data to produce final set of statick profiling data added as input to the packages build processes to be used diuring build processes.

On top of integration of the processing elf binaries during packages buil processes still is IMO possible to use compeltly separated layer with use rpm packages + profiling data as imput and than unpackage rp cpio archive -> apply BOLT optimisation -> package back everytjing to rpm format.

Yet, you may not observe performance differences for small binaries because their working set size is comparable to the size of the processor icache

Sometimes even small short living processes may benefoiit from using BOLT if they wil be running with enough rate per second. As long as it will be after use BOLT run time degradation still it may be sense to sue BOLT generally.

To observe exact behavious still it would be good IMO start use BOLT because withouy that everytjoing may be onle subject of some theoretical conversation :)

rafaelauler commented 3 years ago

Oh, ok. I see your point. Fair enough.

One comment on that, regarding potential challenges: to speed up initialization, you may want to use instrumentation instead of sampling when collecting the profile. Sampling will easily miss constructors and code that is executed once. Using this strategy, I think you can speed up initialization by reducing the number of pages loaded at startup. However, we don't support instrumenting DSOs at the moment, so that is limited to the main binary itself.

It makes sense that small processes can be optimized as well. The big challenge here is collecting a representative profile, and paying attention to the compiler flags used to produce the binaries fed to BOLT. BOLT can't optimize stripped binaries, and it does a better job at binaries produced with relocations (-Wl,-q), otherwise it won't compact pages.

kloczek commented 3 years ago

Just reading below please keep in mind that I've still did not manage to have BOLYT ready env so more or less I'm theoretically only "loudly thinking" about how BOLD is behaving in action. I promisse I'll do that as I'm determined to do that soon :))

My undestanding is that collecting data out of instrumentation would requitre special build go genetrate elf binaries with initialisation. Am I right? If yes taht could be hidden into existing rpm macros like %cmake, %meson, %configure if all builds will be done off-source tree. Currently my m%meson and %cmake are doing off-source tree build but still will be necessary to add similar modyfication for %configure (which is doable). Question is: how much those data about initialisation by instrumentation may be changeing from version to version. If not to much those BOLT input data may be delivered for build process as package input data in Source: lines.

OK that is about initialisation. Secont part is profiler data from rest of the code. If initialisation and rest of the code profiling data will be delivered in separately chunks querstion is: how to merge thoise data and./or can BOLD curretly process multiple profiling data on prooduce final elf binary?

BOLT can't optimize stripped binaries, and it does a better job at binaries produced with relocations (-Wl,-q), otherwise it won't compact pages.

What about use debug info freom $(DESTDIR)/usr/lib/debug/ direcoory? It is not necessary to have such functionality because acrual BOLT action could be plugged before separating debug info however it would be good to know how it is now.

Yet another question/topic. I have a lot of DTrace exp and I'm wotrking to havce DTRace in my kernel and as well full DTrace USDT support. Did anyone which you know have been working on collection profiling data using DTrace profiler provider? If it would be possible to use DTrace profiler provider it may open the gate to produce profiler data out of final productiomn and not instrumented elf binaries. This potentiially could simplify whole scaffold of pliugging BOLT into typical rpm build processes.

rafaelauler commented 3 years ago

For instrumentation, you wouldn't need a special build in the sense of "recompiling every source file". You would need to run BOLT on the binary with "-instrument", run it, and then run BOLT again to optimize it.

But depending on the compiler you use, you may need to add extra flags to make the binary more friendly to BOLT, specially disabling function splitting done by the compiler (gcc fno-reorder-blocks-and-partition), not to mention the linker flag -Wl,-q. But that is necessary anyway, even if you are not instrumenting.

Do you mean to collect the profile just once, and reuse the same profile for multiple versions of the package? That may be a problem because BOLT does not rely on debug info, it relies on function names plus offsets. So if the instructions in a function are different, BOLT will recognize that and turn off optimizations for that function. That's why it's important to recollect the profile for a new version of the program.

BOLT can definitely merge multiple profiles with merge-fdata.

On debug info: BOLT won't use it. It relies on the ELF symbol table and other ABI-specific info (FDEs/CIEs) to locate functions. When you strip the binary, you strip the symbol table and BOLT won't be able to start disassembling it.

I don't know about DTrace. My experience is mostly with Linux perf. I don't think it would be impossible to use DTrace, but that data collection pipeline needs to be developed.

maksfb commented 3 years ago

Keep in mind that if the compiler on your system produce PIEs (position-independent executables) by default, you wouldn't be able to use instrumentation.

kloczek commented 3 years ago

Keep in mind that if the compiler on your system produce PIEs (position-independent executables) by default, you wouldn't be able to use instrumentation.

That is actually very easy to force globally on building rpm packages without touching single line of the source code of the compiled packages.

kloczek commented 3 years ago

Looks like https://github.com/facebookincubator/BOLT/tree/rebased shows 404. I'm still struggling with find step by step instruction how to build BOLT. Can you point on such instruction? Q: is it possible to use BOLT against latest stable LLVM 12.0.1?

aaupov commented 3 years ago

@kloczek: we've recently renamed the rebased branch to main, with no other changes to the build system. You can use our Dockerfile build script as a step-by-step guide how to build BOLT: https://github.com/facebookincubator/BOLT/blob/main/.github/workflows/Dockerfile#L17

Q: is it possible to use BOLT against latest stable LLVM 12.0.1?

What do you mean exactly by that? It's possible to build BOLT and apply it to LLVM 12.0.1 binaries (such as clang or llvm-* tools). To build BOLT, you would need to checkout this repository which is based on upstream LLVM of roughly two weeks ago. It's generally not possible to use it with any other LLVM version as we have dependencies to LLVM APIs that are changing from time to time. We're tracking the upstream LLVM repository since our goal is to include BOLT as a subproject.

kloczek commented 3 years ago

Q: is it possible to use BOLT against latest stable LLVM 12.0.1?

What do you mean exactly by that? It's possible to build BOLT and apply it to LLVM 12.0.1 binaries (such as clang or llvm-* tools). To build BOLT, you would need to checkout this repository which is based on upstream LLVM of roughly two weeks ago. It's generally not possible to use it with any other LLVM version as we have dependencies to LLVM APIs that are changing from time to time. We're tracking the upstream LLVM repository since our goal is to include BOLT as a subproject.

My understanding is that BOLT it is set of modifications/patches which needs to be applied on top of LLVM source. If that is true simple I've been assuming that it is set of patches which could be possible to use against last stable LLVM tree. BOLT and LLVM source trees seems similar however I don't see in BOLT tree any LLVM git tags and any other tags which could be possible to use as reference points to extract necessary patches.

Simple as I have already packages whole LLVM stack I've been thinking about "enrich" that LLVM stack to be able use BOLT functionalities.