bluewhalesystems / sold

The sold linker
MIT License
498 stars 32 forks source link

Slower than Xcode 14 default linker for iOS app #16

Open pm-dev opened 1 year ago

pm-dev commented 1 year ago

I tried out sold for linking an iOS app executable that's ~6.5k swift files. Building on an M1 mac (10 cores) averaged ~2 seconds for the default linker and ~12 seconds using sold. Is this plausible or do you suspect something wrong with my configuration. From the README it wasn't clear if sold always expected to beat Xcode's default, or if there's certain characteristics of a project that may make sold comparatively faster or slower.

rui314 commented 1 year ago

Can you run the linker with -Wl,-perf again and paste the output here? I suspect that there's an unnoticed bottleneck in the sold linker. We need to optimize it.

pm-dev commented 1 year ago
     User   System     Real  Name
   19.294    1.077   11.063  all
   17.191    0.824   10.300    read_input_files
    0.246    0.006    0.069    resolve_symbols
    0.015    0.004    0.002    create_internal_file
    0.000    0.000    0.000    handle_exported_symbols_list
    0.000    0.000    0.000    handle_unexported_symbols_list
    0.027    0.001    0.014    claim_unresolved_symbols
    0.033    0.001    0.004    remove_unreferenced_subsections
    0.049    0.002    0.051    create_synthetic_chunks
    0.127    0.015    0.029    merge_mergeable_sections
    0.003    0.002    0.002      uniquify_literals __literal8
    0.022    0.002    0.005      uniquify_literals __objc_methname
    0.022    0.002    0.005      uniquify_literals __cstring
    0.002    0.001    0.000      uniquify_literals __objc_classname
    0.003    0.001    0.001      uniquify_literals __objc_methtype
    0.001    0.001    0.001      uniquify_literals __literal16
    0.006    0.000    0.006      uniquify_literal_pointers
    0.104    0.001    0.064    scan_relocations
    0.014    0.001    0.002      scan_unwind_info
    0.851    0.053    0.234    assign_offsets
    0.188    0.027    0.130      __TEXT
    0.000    0.000    0.001        __mach_header
    0.000    0.000    0.000        __stubs
    0.082    0.021    0.020        __text
    0.000    0.000    0.000        __gcc_except_tab
    0.000    0.000    0.000        __cstring
    0.102    0.005    0.108        __unwind_info
    0.000    0.000    0.000        __const
    0.000    0.000    0.000        __eh_frame
    0.000    0.000    0.000        __init_offsets
    0.000    0.000    0.000        __literal16
    0.000    0.000    0.000        __literal8
    0.000    0.000    0.000        __objc_classname
    0.000    0.000    0.000        __objc_methname
    0.000    0.000    0.000        __objc_methtype
    0.000    0.000    0.000        __objc_stubs
    0.000    0.000    0.000        __swift5_assocty
    0.000    0.000    0.000        __swift5_builtin
    0.000    0.000    0.000        __swift5_capture
    0.000    0.000    0.000        __swift5_entry
    0.000    0.000    0.000        __swift5_fieldmd
    0.000    0.000    0.000        __swift5_mpenum
    0.000    0.000    0.000        __swift5_proto
    0.000    0.000    0.000        __swift5_protos
    0.000    0.000    0.000        __swift5_reflstr
    0.000    0.000    0.000        __swift5_typeref
    0.000    0.000    0.000        __swift5_types
    0.000    0.000    0.000        __ustring
    0.001    0.000    0.001      __DATA_CONST
    0.000    0.000    0.000        __got
    0.000    0.000    0.000        __const
    0.000    0.000    0.000        __cfstring
    0.000    0.000    0.000        __objc_catlist
    0.000    0.000    0.000        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.000    0.000    0.000        __objc_protolist
    0.001    0.000    0.001      __DATA
    0.000    0.000    0.000        __thread_ptrs
    0.000    0.000    0.000        __data
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __objc_classrefs
    0.000    0.000    0.000        __objc_const
    0.000    0.000    0.000        __objc_data
    0.000    0.000    0.000        __objc_ivar
    0.000    0.000    0.000        __objc_protorefs
    0.000    0.000    0.000        __objc_selrefs
    0.000    0.000    0.000        __objc_stublist
    0.000    0.000    0.000        __objc_superrefs
    0.000    0.000    0.000        __swift51_hooks
    0.000    0.000    0.000        __swift_hooks
    0.000    0.000    0.000      __LLVM
    0.000    0.000    0.000        __bitcode
    0.000    0.000    0.000        __cmdline
    0.000    0.000    0.000        __swift_modhash
    0.662    0.026    0.102      __LINKEDIT
    0.662    0.026    0.102        __chainfixups
    0.017    0.001    0.003        __data_in_code
    0.645    0.026    0.084        __export
    0.000    0.000    0.000        __ind_sym_tab
    0.288    0.011    0.036        __func_starts
    0.254    0.009    0.028        __symbol_table
    0.000    0.000    0.000        __string_table
    0.003    0.042    0.051    open_file
    0.483    0.112    0.127    copy_sections_to_output_file
    0.483    0.111    0.127      __TEXT
    0.006    0.001    0.001        __mach_header
    0.002    0.001    0.000        __objc_methtype
    0.001    0.001    0.000        __swift5_mpenum
    0.013    0.002    0.002        __gcc_except_tab
    0.103    0.013    0.014        __const
    0.026    0.002    0.003        __swift5_proto
    0.007    0.001    0.001        __objc_stubs
    0.005    0.000    0.001        __stubs
    0.000    0.000    0.000        __literal16
    0.002    0.000    0.000        __literal8
    0.454    0.109    0.103        __text
    0.002    0.000    0.000        __objc_classname
    0.009    0.001    0.001        __swift5_assocty
    0.028    0.002    0.004        __objc_methname
    0.033    0.003    0.004        __cstring
    0.000    0.000    0.000        __swift5_builtin
    0.011    0.001    0.001        __swift5_capture
    0.005    0.001    0.001        __swift5_protos
    0.000    0.000    0.000        __swift5_entry
    0.042    0.005    0.006        __swift5_fieldmd
    0.021    0.002    0.003        __swift5_reflstr
    0.437    0.106    0.120        __unwind_info
    0.038    0.005    0.005        __swift5_typeref
    0.019    0.002    0.003        __swift5_types
    0.000    0.000    0.000        __ustring
    0.121    0.018    0.020        __eh_frame
    0.001    0.000    0.000        __init_offsets
    0.082    0.010    0.011      __DATA_CONST
    0.007    0.001    0.001        __got
    0.017    0.002    0.002        __objc_classlist
    0.000    0.001    0.000        __objc_nlclslist
    0.002    0.000    0.000        __objc_protolist
    0.000    0.000    0.000        __objc_nlcatlist
    0.075    0.009    0.010        __const
    0.009    0.001    0.001        __cfstring
    0.002    0.000    0.000        __objc_catlist
    0.379    0.081    0.075      __DATA
    0.000    0.000    0.000        __thread_ptrs
    0.050    0.005    0.007        __data
    0.001    0.000    0.000        __objc_protorefs
    0.023    0.002    0.003        __objc_selrefs
    0.000    0.000    0.000        __objc_stublist
    0.003    0.000    0.000        __objc_superrefs
    0.000    0.000    0.000        __swift51_hooks
    0.000    0.000    0.000        __swift_hooks
    0.014    0.002    0.002        __objc_classrefs
    0.051    0.007    0.007        __objc_data
    0.000    0.000    0.000        __objc_imageinfo
    0.075    0.010    0.010        __objc_const
    0.003    0.000    0.000        __objc_ivar
    0.392    0.085    0.079      __LLVM
    0.391    0.085    0.079        __bitcode
    0.003    0.001    0.001        __cmdline
    0.011    0.001    0.002        __swift_modhash
    0.328    0.075    0.068      __LINKEDIT
    0.004    0.001    0.001        __chainfixups
    0.115    0.016    0.017        __export
    0.000    0.000    0.000        __data_in_code
    0.296    0.070    0.064        __symbol_table
    0.003    0.000    0.000        __ind_sym_tab
    0.000    0.000    0.000        __string_table
    0.001    0.000    0.000        __func_starts
    0.009    0.000    0.009    write_fixup_chains
    0.075    0.000    0.010    copy_sections_to_output_file
    0.000    0.014    0.056    close_file
rui314 commented 1 year ago

Thanks. So the read_input_files pass dominate the entire execution time. It's not expected -- on my M2 Mac, sold can read ~2500 files in ~0.2 seconds, so even if you have 6.5k input files, it shouldn't take that long.

I want to make sure that you have built sold with the default cmake options. Specifically you haven't built it with -DCMAKE_BUILD_TYPE=Debug.

I wonder if your project is open-source. If so, please let me know the location of the repository so that I can build it myself to reproduce the issue.

If your project is not open-source, can you rebuild sold with the following patch and run it again with -Wl,-perf,-thread_count,1 to see if there's a file that's particularly slow to read?

diff --git a/macho/input-files.cc b/macho/input-files.cc
index 44e5da2a..79e58a89 100644
--- a/macho/input-files.cc
+++ b/macho/input-files.cc
@@ -35,6 +35,7 @@ template <typename E>
 ObjectFile<E> *
 ObjectFile<E>::create(Context<E> &ctx, MappedFile<Context<E>> *mf,
                       std::string archive_name) {
+  Timer t(ctx, "ObjectFile::create " + std::string(mf->name));
   ObjectFile<E> *obj = new ObjectFile<E>(mf);
   obj->archive_name = archive_name;
   obj->is_alive = archive_name.empty() || ctx.all_load;
@@ -1199,6 +1200,7 @@ DylibFile<E>::DylibFile(Context<E> &ctx, MappedFile<Context<E>> *mf)

 template <typename E>
 DylibFile<E> *DylibFile<E>::create(Context<E> &ctx, MappedFile<Context<E>> *mf) {
+  Timer t(ctx, "DylibFile::create " + std::string(mf->name));
   DylibFile<E> *file = new DylibFile<E>(ctx, mf);
   ctx.dylib_pool.emplace_back(file);
   return file;
diff --git a/macho/macho-main.cc b/macho/macho-main.cc
index e3678787..c7b9c910 100644
--- a/macho/macho-main.cc
+++ b/macho/macho-main.cc
@@ -883,6 +883,8 @@ strip_universal_header(Context<E> &ctx, MappedFile<Context<E>> *mf) {

 template <typename E>
 static void read_file(Context<E> &ctx, MappedFile<Context<E>> *mf) {
+  Timer t(ctx, std::string(mf->name));
+
   if (get_file_type(ctx, mf) == FileType::MACH_UNIVERSAL)
     mf = strip_universal_header(ctx, mf);
pm-dev commented 1 year ago

The issue is I had the address sanitizer turned on. Had seen this comment and included the MOLD_USE_ASAN flag.

Now the link time is consistently 1.4 seconds which is same as the default linker. Here's the sold breakdown with -Wl,-perf

     User   System     Real  Name
    2.393    0.649    1.360  all
    1.739    0.510    1.123    read_input_files
    0.071    0.015    0.020    resolve_symbols
    0.007    0.004    0.001    create_internal_file
    0.000    0.000    0.000    handle_exported_symbols_list
    0.000    0.000    0.000    handle_unexported_symbols_list
    0.008    0.000    0.003    claim_unresolved_symbols
    0.016    0.000    0.002    remove_unreferenced_subsections
    0.006    0.001    0.007    create_synthetic_chunks
    0.065    0.019    0.013    merge_mergeable_sections
    0.001    0.001    0.001      uniquify_literals __literal8
    0.011    0.001    0.002      uniquify_literals __objc_methname
    0.010    0.002    0.002      uniquify_literals __cstring
    0.001    0.001    0.000      uniquify_literals __objc_classname
    0.001    0.001    0.000      uniquify_literals __objc_methtype
    0.000    0.000    0.000      uniquify_literals __literal16
    0.002    0.000    0.002      uniquify_literal_pointers
    0.040    0.001    0.025    scan_relocations
    0.005    0.000    0.001      scan_unwind_info
    0.192    0.019    0.048    assign_offsets
    0.045    0.007    0.025      __TEXT
    0.000    0.000    0.000        __mach_header
    0.000    0.000    0.000        __stubs
    0.030    0.004    0.007        __text
    0.000    0.000    0.000        __gcc_except_tab
    0.000    0.000    0.000        __cstring
    0.014    0.003    0.016        __unwind_info
    0.000    0.000    0.000        __const
    0.000    0.000    0.000        __eh_frame
    0.000    0.000    0.000        __init_offsets
    0.000    0.000    0.000        __literal16
    0.000    0.000    0.000        __literal8
    0.000    0.000    0.000        __objc_classname
    0.000    0.000    0.000        __objc_methname
    0.000    0.000    0.000        __objc_methtype
    0.000    0.000    0.000        __objc_stubs
    0.000    0.000    0.000        __swift5_assocty
    0.000    0.000    0.000        __swift5_builtin
    0.000    0.000    0.000        __swift5_capture
    0.000    0.000    0.000        __swift5_entry
    0.000    0.000    0.000        __swift5_fieldmd
    0.000    0.000    0.000        __swift5_mpenum
    0.000    0.000    0.000        __swift5_proto
    0.000    0.000    0.000        __swift5_protos
    0.000    0.000    0.000        __swift5_reflstr
    0.000    0.000    0.000        __swift5_typeref
    0.000    0.000    0.000        __swift5_types
    0.000    0.000    0.000        __ustring
    0.000    0.000    0.000      __DATA_CONST
    0.000    0.000    0.000        __got
    0.000    0.000    0.000        __const
    0.000    0.000    0.000        __cfstring
    0.000    0.000    0.000        __objc_catlist
    0.000    0.000    0.000        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.000    0.000    0.000        __objc_protolist
    0.000    0.000    0.000      __DATA
    0.000    0.000    0.000        __thread_ptrs
    0.000    0.000    0.000        __data
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __objc_classrefs
    0.000    0.000    0.000        __objc_const
    0.000    0.000    0.000        __objc_data
    0.000    0.000    0.000        __objc_ivar
    0.000    0.000    0.000        __objc_protorefs
    0.000    0.000    0.000        __objc_selrefs
    0.000    0.000    0.000        __objc_stublist
    0.000    0.000    0.000        __objc_superrefs
    0.000    0.000    0.000        __swift51_hooks
    0.000    0.000    0.000        __swift_hooks
    0.000    0.000    0.000      __LLVM
    0.000    0.000    0.000        __bitcode
    0.000    0.000    0.000        __cmdline
    0.000    0.000    0.000        __swift_modhash
    0.146    0.011    0.022      __LINKEDIT
    0.134    0.010    0.018        __chainfixups
    0.036    0.002    0.004        __data_in_code
    0.000    0.000    0.000        __ind_sym_tab
    0.146    0.011    0.022        __export
    0.095    0.006    0.012        __symbol_table
    0.079    0.005    0.010        __func_starts
    0.000    0.000    0.000        __string_table
    0.003    0.014    0.016    open_file
    0.132    0.045    0.037    copy_sections_to_output_file
    0.132    0.045    0.037      __TEXT
    0.000    0.000    0.000        __mach_header
    0.035    0.004    0.005        __const
    0.001    0.000    0.000        __objc_methtype
    0.000    0.000    0.000        __swift5_mpenum
    0.002    0.001    0.000        __gcc_except_tab
    0.001    0.000    0.000        __literal16
    0.001    0.000    0.000        __stubs
    0.010    0.001    0.001        __swift5_proto
    0.000    0.000    0.000        __objc_stubs
    0.131    0.044    0.036        __text
    0.005    0.001    0.001        __swift5_assocty
    0.001    0.000    0.000        __literal8
    0.013    0.001    0.002        __cstring
    0.001    0.000    0.000        __objc_classname
    0.008    0.001    0.001        __objc_methname
    0.000    0.000    0.000        __swift5_builtin
    0.006    0.000    0.001        __swift5_capture
    0.002    0.000    0.000        __swift5_protos
    0.009    0.001    0.001        __swift5_reflstr
    0.000    0.000    0.000        __swift5_entry
    0.018    0.002    0.002        __swift5_fieldmd
    0.098    0.024    0.020        __unwind_info
    0.057    0.010    0.009        __eh_frame
    0.016    0.002    0.002        __swift5_typeref
    0.008    0.001    0.001        __swift5_types
    0.000    0.000    0.000        __ustring
    0.000    0.000    0.000        __init_offsets
    0.063    0.008    0.009      __DATA
    0.000    0.000    0.000        __thread_ptrs
    0.024    0.002    0.003        __data
    0.000    0.000    0.000        __objc_protorefs
    0.007    0.001    0.001        __objc_selrefs
    0.000    0.000    0.000        __objc_stublist
    0.001    0.000    0.000        __objc_superrefs
    0.000    0.000    0.000        __swift51_hooks
    0.000    0.000    0.000        __swift_hooks
    0.000    0.000    0.000        __objc_imageinfo
    0.007    0.001    0.001        __objc_classrefs
    0.029    0.004    0.004        __objc_const
    0.018    0.002    0.003        __objc_data
    0.000    0.000    0.000        __objc_ivar
    0.116    0.029    0.024      __LLVM
    0.116    0.029    0.024        __bitcode
    0.001    0.000    0.000        __cmdline
    0.003    0.000    0.000        __swift_modhash
    0.044    0.006    0.006      __DATA_CONST
    0.002    0.000    0.000        __got
    0.031    0.003    0.004        __const
    0.004    0.001    0.001        __cfstring
    0.001    0.000    0.000        __objc_catlist
    0.005    0.001    0.001        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.001    0.000    0.000        __objc_protolist
    0.119    0.031    0.026      __LINKEDIT
    0.001    0.000    0.000        __chainfixups
    0.049    0.006    0.007        __export
    0.000    0.000    0.000        __data_in_code
    0.086    0.025    0.020        __symbol_table
    0.001    0.000    0.000        __func_starts
    0.001    0.000    0.000        __ind_sym_tab
    0.000    0.000    0.000        __string_table
    0.001    0.000    0.001    write_fixup_chains
    0.073    0.000    0.009    copy_sections_to_output_file
    0.000    0.018    0.038    close_file

Also, unrelated, but I also get hundreds of warnings: "(arm64) failed to insert symbol abc in the debug map" and "(arm) could not find object file symbol for symbol xyz"

rui314 commented 1 year ago

What is your cmake config? Please run cmake -N -L . and paste the output. Here is mine fyi:

$ cmake -N -L .
-- Cache values
BUILD_TESTING:BOOL=ON
CMAKE_BUILD_TYPE:STRING=Release
CMAKE_INSTALL_PREFIX:PATH=/usr/local
CMAKE_OSX_ARCHITECTURES:STRING=
CMAKE_OSX_DEPLOYMENT_TARGET:STRING=13.0
CMAKE_OSX_SYSROOT:PATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk
MOLD_LTO:BOOL=0
MOLD_MOSTLY_STATIC:BOOL=OFF
MOLD_USE_ASAN:BOOL=OFF
MOLD_USE_MOLD:BOOL=0
MOLD_USE_SYSTEM_TBB:BOOL=OFF
MOLD_USE_TSAN:BOOL=OFF
TBB4PY_BUILD:BOOL=OFF
TBBMALLOC_BUILD:BOOL=ON
TBBMALLOC_PROXY_BUILD:BOOL=ON
TBB_BUILD:BOOL=ON
TBB_CPF:BOOL=OFF
TBB_DISABLE_HWLOC_AUTOMATIC_SEARCH:BOOL=OFF
TBB_ENABLE_IPO:BOOL=ON
TBB_EXAMPLES:BOOL=OFF
TBB_FIND_PACKAGE:BOOL=OFF
TBB_INSTALL_VARS:BOOL=OFF
TBB_NO_APPCONTAINER:BOOL=OFF
TBB_SANITIZE:STRING=
TBB_TEST_SPEC:BOOL=OFF
TBB_VALGRIND_MEMCHECK:BOOL=OFF
TBB_WINDOWS_DRIVER:BOOL=OFF
ZSTD_BUILD_CONTRIB:BOOL=OFF
ZSTD_BUILD_PROGRAMS:BOOL=ON
ZSTD_BUILD_SHARED:BOOL=ON
ZSTD_BUILD_STATIC:BOOL=ON
ZSTD_BUILD_TESTS:BOOL=OFF
ZSTD_LEGACY_SUPPORT:BOOL=OFF
ZSTD_LZ4_SUPPORT:BOOL=OFF
ZSTD_LZMA_SUPPORT:BOOL=OFF
ZSTD_MULTITHREAD_SUPPORT:BOOL=ON
ZSTD_PROGRAMS_LINK_SHARED:BOOL=OFF
ZSTD_ZLIB_SUPPORT:BOOL=OFF
pm-dev commented 1 year ago

Looks to be the same

➜  build git:(main) cmake -N -L .
-- Cache values
BUILD_TESTING:BOOL=ON
CMAKE_BUILD_TYPE:STRING=Release
CMAKE_INSTALL_PREFIX:PATH=/usr/local
CMAKE_OSX_ARCHITECTURES:STRING=
CMAKE_OSX_DEPLOYMENT_TARGET:STRING=
CMAKE_OSX_SYSROOT:PATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk
MOLD_LTO:BOOL=OFF
MOLD_MOSTLY_STATIC:BOOL=OFF
MOLD_USE_ASAN:BOOL=OFF
MOLD_USE_MOLD:BOOL=OFF
MOLD_USE_SYSTEM_TBB:BOOL=OFF
MOLD_USE_TSAN:BOOL=OFF
TBB4PY_BUILD:BOOL=OFF
TBBMALLOC_BUILD:BOOL=ON
TBBMALLOC_PROXY_BUILD:BOOL=ON
TBB_BUILD:BOOL=ON
TBB_CPF:BOOL=OFF
TBB_DISABLE_HWLOC_AUTOMATIC_SEARCH:BOOL=OFF
TBB_ENABLE_IPO:BOOL=ON
TBB_EXAMPLES:BOOL=OFF
TBB_FIND_PACKAGE:BOOL=OFF
TBB_INSTALL_VARS:BOOL=OFF
TBB_NO_APPCONTAINER:BOOL=OFF
TBB_SANITIZE:STRING=
TBB_TEST_SPEC:BOOL=OFF
TBB_VALGRIND_MEMCHECK:BOOL=OFF
TBB_WINDOWS_DRIVER:BOOL=OFF
ZSTD_BUILD_CONTRIB:BOOL=OFF
ZSTD_BUILD_PROGRAMS:BOOL=ON
ZSTD_BUILD_SHARED:BOOL=ON
ZSTD_BUILD_STATIC:BOOL=ON
ZSTD_BUILD_TESTS:BOOL=OFF
ZSTD_LEGACY_SUPPORT:BOOL=OFF
ZSTD_LZ4_SUPPORT:BOOL=OFF
ZSTD_LZMA_SUPPORT:BOOL=OFF
ZSTD_MULTITHREAD_SUPPORT:BOOL=ON
ZSTD_PROGRAMS_LINK_SHARED:BOOL=OFF
ZSTD_ZLIB_SUPPORT:BOOL=OFF
rui314 commented 1 year ago

If you apply the patch (https://github.com/bluewhalesystems/sold/issues/16#issuecomment-1429459090) and run with -Wl,-perf,-thread_count,1, is there any particularly slow file?

pm-dev commented 1 year ago

All files show 0.000 for Real except for 7 frameworks which add up to 0.067. The entire read_input_files phase adds up to 2.812. Not sure how to explain that. Happy to send the full output to your email.

BalestraPatrick commented 1 year ago

I seem to have the same issue on a M1 Max. I applied -Wl,-perf,-thread_count,1 and don't see any particular file that is slow to read (everything is 0.001 or 0.000).

     User   System     Real  Name
    7.478    2.279    9.187  all
    3.412    1.656    7.174    read_input_files
rui314 commented 1 year ago

I believe there's a bottleneck that stands out under specific conditions, so we need to figure it out. Can you run the linker under the profiler and share the output file? On macOS, you can run the command under a profiler with the command.

xctrace record --template 'Time Profiler' --launch -- /path/to/sold/ld64 <command line arguments>

A few notes about the profiler:

  1. You should build sold with -DCMAKE_BUILD_TYPE=RelWithDebInfo so that the profiler is able to figure out function names, etc.
  2. You can see the arguments passed to the linker by appending -### to the compiler driver linker invocation.
  3. Run the same command a few times to warm up caches and share the last run's otuput.
BalestraPatrick commented 1 year ago

Thanks for the great instructions. For me it looks like this. After warming up the caches by running the command three times the linking seems to take about 3-4s. I sent the trace over email as well if you want to take a look at that.

Screenshot 2023-02-22 at 12 34 22 AM
rui314 commented 1 year ago

Thank you for sharing the profiling data. But it looks like mold took only 11.00 ms to finish in your profile. Is this a correct sample that you wanted to take?

BalestraPatrick commented 1 year ago

Oops, yes. I missed the @ to load the params from a file (xctrace record --template 'Time Profiler' --launch -- /path/to/sold/ld64 @file.params. Here we go.

Screenshot 2023-02-22 at 8 57 57 AM
rui314 commented 1 year ago

Thanks for sharing the profiling data! I fixed some obvious bottlenecks in the above commits. There are other bottlenecks in which sold uses only single core, so we can improve it even more. But for now, I believe this is good enough. Could you rebuild sold and try again?

BalestraPatrick commented 1 year ago

Thanks for the fixes! I don't see the tbd slowdown anymore. I seem to still see the following sections taking quite a bit of time:

User   System     Real  Name
    6.963    2.308   10.913  all
    3.032    1.656    8.623    read_input_files
    0.793    0.004    0.107    scan_relocations
    1.305    0.100    0.331    assign_offsets
    0.368    0.263    1.355    write_signature

Happy to apply more patches to see where the bottlenecks are because I don't see any tbd parsing now taking a significant amount of time.

Also one question for my personal interest: I understand that the cached scenario makes sense to test, but I was wondering how does that affect real-world performance. Every incremental build takes about 10s or more to link, while of course rerunning the same command by not changing anything in the codebase takes 2-3x times less than that.

rui314 commented 1 year ago

The above change should increase parallelism in the read_input_files stage. What are the numbers with that change?

Do you know which sub-stages in assign_offsets take a long time?

The perf number of write_signature is very mysterious. write_signature computes SHA256 hashes for all pages of the output file that the linker has just created, and it is embarrassingly parallel. Since the file has just be written by the linker itself, it is always in the buffer cache. So it shouldn't take that long. Are you testing this with -thread_count 1?

BalestraPatrick commented 1 year ago
  1. It really depends. A cached build can make the read_input_files stage take anywhere from 3s to 0.6s. This is an example after executing the same command about 5 times.
     User   System     Real  Name
    6.852    1.112    1.892  all
    2.830    0.639    0.661    read_input_files
    0.179    0.018    0.074    resolve_symbols
    0.043    0.030    0.011    create_internal_file
    0.000    0.000    0.000    handle_exported_symbols_list
    0.000    0.000    0.000    handle_unexported_symbols_list
    0.069    0.004    0.026    claim_unresolved_symbols
    0.085    0.001    0.011    remove_unreferenced_subsections
    0.046    0.008    0.053    create_synthetic_chunks
    0.427    0.021    0.080    merge_mergeable_sections
    0.074    0.004    0.014      uniquify_literals __objc_methname
    0.166    0.005    0.029      uniquify_literals __cstring
    0.003    0.001    0.001      uniquify_literals __literal8
    0.006    0.001    0.001      uniquify_literals __objc_classname
    0.011    0.001    0.003      uniquify_literals __objc_methtype
    0.000    0.000    0.000      uniquify_literals __oslogstring
    0.001    0.000    0.000      uniquify_literals __literal16
    0.000    0.000    0.000      uniquify_literals __literal4
    0.009    0.000    0.010      uniquify_literal_pointers
    0.760    0.006    0.099    scan_relocations
    1.333    0.121    0.296    assign_offsets
    0.300    0.069    0.129      __TEXT
    0.000    0.000    0.000        __mach_header
    0.000    0.000    0.000        __stubs
    0.148    0.064    0.045        __text
    0.000    0.000    0.000        __stub_helper
    0.000    0.000    0.000        __gcc_except_tab
    0.001    0.000    0.001        __cstring
    0.147    0.005    0.079        __unwind_info
    0.001    0.000    0.001        __const
    0.000    0.000    0.000        __eh_frame
    0.000    0.000    0.000        __entitlements
    0.000    0.000    0.000        __literal16
    0.000    0.000    0.000        __literal4
    0.000    0.000    0.000        __literal8
    0.000    0.000    0.000        __objc_classname
    0.001    0.000    0.001        __objc_methname
    0.000    0.000    0.000        __objc_methtype
    0.000    0.000    0.000        __objc_stubs
    0.000    0.000    0.000        __oslogstring
    0.000    0.000    0.000        __swift5_assocty
    0.000    0.000    0.000        __swift5_builtin
    0.000    0.000    0.000        __swift5_capture
    0.000    0.000    0.000        __swift5_entry
    0.000    0.000    0.000        __swift5_fieldmd
    0.000    0.000    0.000        __swift5_mpenum
    0.000    0.000    0.000        __swift5_proto
    0.000    0.000    0.000        __swift5_protos
    0.000    0.000    0.000        __swift5_reflstr
    0.000    0.000    0.000        __swift5_typeref
    0.000    0.000    0.000        __swift5_types
    0.000    0.000    0.000        __ustring
    0.004    0.000    0.004      __DATA_CONST
    0.000    0.000    0.000        __got
    0.002    0.000    0.002        __const
    0.000    0.000    0.000        __mod_init_func
    0.002    0.000    0.002        __cfstring
    0.000    0.000    0.000        __objc_catlist
    0.000    0.000    0.000        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.000    0.000    0.000        __objc_protolist
    0.003    0.000    0.003      __DATA
    0.000    0.000    0.000        __la_symbol_ptr
    0.000    0.000    0.000        __thread_ptrs
    0.001    0.000    0.001        __data
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __thread_vars
    0.000    0.000    0.000        __thread_data
    0.000    0.000    0.000        __objc_arraydata
    0.000    0.000    0.000        __objc_arrayobj
    0.000    0.000    0.000        __objc_classrefs
    0.001    0.000    0.001        __objc_const
    0.000    0.000    0.000        __objc_data
    0.000    0.000    0.000        __objc_dictobj
    0.000    0.000    0.000        __objc_doubleobj
    0.000    0.000    0.000        __objc_floatobj
    0.000    0.000    0.000        __objc_intobj
    0.000    0.000    0.000        __objc_ivar
    0.000    0.000    0.000        __objc_protorefs
    0.000    0.000    0.000        __objc_selrefs
    0.000    0.000    0.000        __objc_stublist
    0.000    0.000    0.000        __objc_superrefs
    0.000    0.000    0.000      __LLVM
    0.000    0.000    0.000        __bitcode
    0.000    0.000    0.000        __bundle
    0.000    0.000    0.000        __cmdline
    0.000    0.000    0.000        __swift_cmdline
    0.000    0.000    0.000        __swift_modhash
    1.026    0.052    0.159      __LINKEDIT
    0.700    0.035    0.088        __rebase
    0.157    0.007    0.019        __data_in_code
    0.006    0.001    0.001        __lazy_binding
    1.026    0.052    0.159        __binding
    0.000    0.000    0.000        __ind_sym_tab
    0.606    0.031    0.073        __symbol_table
    1.006    0.050    0.138        __export
    0.623    0.031    0.075        __func_starts
    0.000    0.000    0.000        __string_table
    0.000    0.000    0.000        __code_signature
    0.010    0.072    0.083    open_file
    0.618    0.080    0.079    copy_sections_to_output_file
    0.618    0.080    0.079      __TEXT
    0.000    0.000    0.000        __mach_header
    0.010    0.002    0.001        __objc_methtype
    0.099    0.021    0.013        __swift5_fieldmd
    0.000    0.000    0.000        __stubs
    0.617    0.079    0.079        __text
    0.020    0.004    0.003        __swift5_assocty
    0.010    0.002    0.001        __objc_stubs
    0.028    0.005    0.004        __swift5_capture
    0.003    0.000    0.000        __swift5_builtin
    0.000    0.000    0.000        __oslogstring
    0.039    0.010    0.005        __swift5_reflstr
    0.000    0.000    0.000        __swift5_entry
    0.041    0.011    0.006        __swift5_proto
    0.081    0.009    0.010        __swift5_typeref
    0.057    0.004    0.007        __swift5_types
    0.006    0.000    0.001        __swift5_protos
    0.002    0.000    0.000        __swift5_mpenum
    0.231    0.023    0.029        __const
    0.000    0.000    0.000        __literal4
    0.002    0.000    0.000        __literal8
    0.006    0.000    0.001        __ustring
    0.012    0.001    0.002        __objc_classname
    0.000    0.000    0.000        __entitlements
    0.001    0.000    0.000        __literal16
    0.061    0.007    0.007        __objc_methname
    0.000    0.000    0.000        __stub_helper
    0.018    0.002    0.002        __gcc_except_tab
    0.070    0.007    0.009        __cstring
    0.152    0.017    0.019        __unwind_info
    0.282    0.035    0.036        __eh_frame
    0.444    0.056    0.056      __DATA_CONST
    0.008    0.002    0.001        __got
    0.018    0.004    0.002        __objc_catlist
    0.436    0.054    0.055        __const
    0.119    0.020    0.015        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.008    0.001    0.001        __objc_protolist
    0.003    0.000    0.000        __mod_init_func
    0.154    0.014    0.019        __cfstring
    0.546    0.069    0.069      __DATA
    0.000    0.000    0.000        __la_symbol_ptr
    0.000    0.000    0.000        __thread_ptrs
    0.111    0.022    0.015        __data
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __thread_vars
    0.000    0.000    0.000        __thread_data
    0.001    0.000    0.000        __objc_arraydata
    0.001    0.000    0.000        __objc_arrayobj
    0.042    0.003    0.005        __objc_classrefs
    0.001    0.000    0.000        __objc_dictobj
    0.000    0.000    0.000        __objc_doubleobj
    0.000    0.000    0.000        __objc_floatobj
    0.001    0.000    0.000        __objc_intobj
    0.029    0.003    0.004        __objc_ivar
    0.323    0.035    0.040        __objc_const
    0.005    0.001    0.001        __objc_protorefs
    0.104    0.010    0.013        __objc_selrefs
    0.000    0.000    0.000        __thread_bss
    0.000    0.000    0.000        __objc_stublist
    0.010    0.001    0.001        __objc_superrefs
    0.068    0.009    0.009        __objc_data
    0.082    0.019    0.011      __LLVM
    0.082    0.019    0.011        __bitcode
    0.002    0.000    0.000        __cmdline
    0.000    0.000    0.000        __swift_cmdline
    0.014    0.003    0.002        __swift_modhash
    0.036    0.010    0.005        __bundle
    0.600    0.077    0.076      __LINKEDIT
    0.006    0.001    0.001        __rebase
    0.045    0.010    0.006        __binding
    0.002    0.001    0.000        __lazy_binding
    0.374    0.042    0.047        __export
    0.000    0.000    0.000        __data_in_code
    0.529    0.060    0.067        __symbol_table
    0.003    0.000    0.000        __func_starts
    0.004    0.000    0.000        __ind_sym_tab
    0.000    0.000    0.000        __string_table
    0.000    0.000    0.000        __code_signature
    0.284    0.107    0.329    write_signature
    0.000    0.027    0.027      msync
    0.000    0.003    0.003    close_file

But again, when I run a build with Bazel (without sandbox of course), I get much worse numbers:

     User   System     Real  Name
    6.979    2.182    9.100  all
    2.854    1.596    7.238    read_input_files
    0.185    0.049    0.076    resolve_symbols
    0.041    0.039    0.010    create_internal_file
    0.000    0.000    0.000    handle_exported_symbols_list
    0.000    0.000    0.000    handle_unexported_symbols_list
    0.072    0.004    0.027    claim_unresolved_symbols
    0.090    0.001    0.012    remove_unreferenced_subsections
    0.043    0.006    0.051    create_synthetic_chunks
    0.329    0.033    0.058    merge_mergeable_sections
    0.060    0.005    0.010      uniquify_literals __objc_methname
    0.077    0.005    0.014      uniquify_literals __cstring
    0.003    0.001    0.002      uniquify_literals __literal8
    0.006    0.001    0.001      uniquify_literals __objc_classname
    0.011    0.001    0.002      uniquify_literals __objc_methtype
    0.000    0.000    0.000      uniquify_literals __oslogstring
    0.001    0.000    0.000      uniquify_literals __literal16
    0.000    0.000    0.000      uniquify_literals __literal4
    0.009    0.001    0.010      uniquify_literal_pointers
    0.922    0.004    0.106    scan_relocations
    1.326    0.108    0.290    assign_offsets
    0.301    0.058    0.130      __TEXT
    0.000    0.000    0.000        __mach_header
    0.000    0.000    0.000        __stubs
    0.143    0.054    0.046        __text
    0.000    0.000    0.000        __stub_helper
    0.000    0.000    0.000        __gcc_except_tab
    0.001    0.000    0.001        __cstring
    0.153    0.004    0.079        __unwind_info
    0.001    0.000    0.001        __const
    0.000    0.000    0.000        __eh_frame
    0.000    0.000    0.000        __entitlements
    0.000    0.000    0.000        __literal16
    0.000    0.000    0.000        __literal4
    0.000    0.000    0.000        __literal8
    0.000    0.000    0.000        __objc_classname
    0.001    0.000    0.001        __objc_methname
    0.000    0.000    0.000        __objc_methtype
    0.000    0.000    0.000        __objc_stubs
    0.000    0.000    0.000        __oslogstring
    0.000    0.000    0.000        __swift5_assocty
    0.000    0.000    0.000        __swift5_builtin
    0.000    0.000    0.000        __swift5_capture
    0.000    0.000    0.000        __swift5_entry
    0.000    0.000    0.000        __swift5_fieldmd
    0.000    0.000    0.000        __swift5_mpenum
    0.000    0.000    0.000        __swift5_proto
    0.000    0.000    0.000        __swift5_protos
    0.000    0.000    0.000        __swift5_reflstr
    0.000    0.000    0.000        __swift5_typeref
    0.000    0.000    0.000        __swift5_types
    0.000    0.000    0.000        __ustring
    0.004    0.000    0.004      __DATA_CONST
    0.000    0.000    0.000        __got
    0.002    0.000    0.002        __const
    0.000    0.000    0.000        __mod_init_func
    0.002    0.000    0.002        __cfstring
    0.000    0.000    0.000        __objc_catlist
    0.000    0.000    0.000        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.000    0.000    0.000        __objc_protolist
    0.003    0.000    0.003      __DATA
    0.000    0.000    0.000        __la_symbol_ptr
    0.000    0.000    0.000        __thread_ptrs
    0.001    0.000    0.001        __data
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __thread_vars
    0.000    0.000    0.000        __thread_data
    0.000    0.000    0.000        __objc_arraydata
    0.000    0.000    0.000        __objc_arrayobj
    0.000    0.000    0.000        __objc_classrefs
    0.001    0.000    0.001        __objc_const
    0.000    0.000    0.000        __objc_data
    0.000    0.000    0.000        __objc_dictobj
    0.000    0.000    0.000        __objc_doubleobj
    0.000    0.000    0.000        __objc_floatobj
    0.000    0.000    0.000        __objc_intobj
    0.000    0.000    0.000        __objc_ivar
    0.000    0.000    0.000        __objc_protorefs
    0.000    0.000    0.000        __objc_selrefs
    0.000    0.000    0.000        __objc_stublist
    0.000    0.000    0.000        __objc_superrefs
    0.000    0.000    0.000      __LLVM
    0.000    0.000    0.000        __bitcode
    0.000    0.000    0.000        __bundle
    0.000    0.000    0.000        __cmdline
    0.000    0.000    0.000        __swift_cmdline
    0.000    0.000    0.000        __swift_modhash
    1.018    0.050    0.152      __LINKEDIT
    0.858    0.038    0.103        __rebase
    0.165    0.008    0.020        __data_in_code
    0.007    0.001    0.001        __lazy_binding
    1.018    0.050    0.152        __binding
    1.002    0.048    0.135        __export
    0.631    0.032    0.077        __func_starts
    0.000    0.000    0.000        __ind_sym_tab
    0.580    0.029    0.069        __symbol_table
    0.000    0.000    0.000        __string_table
    0.000    0.000    0.000        __code_signature
    0.009    0.051    0.060    open_file
    0.581    0.109    0.094    copy_sections_to_output_file
    0.581    0.109    0.094      __TEXT
    0.000    0.000    0.000        __mach_header
    0.000    0.000    0.000        __stubs
    0.009    0.002    0.001        __objc_methtype
    0.581    0.109    0.094        __text
    0.005    0.001    0.001        __objc_stubs
    0.000    0.000    0.000        __oslogstring
    0.026    0.004    0.003        __swift5_assocty
    0.003    0.000    0.000        __swift5_builtin
    0.027    0.004    0.004        __swift5_capture
    0.000    0.000    0.000        __swift5_entry
    0.090    0.013    0.013        __swift5_fieldmd
    0.186    0.027    0.026        __const
    0.000    0.000    0.000        __stub_helper
    0.017    0.003    0.002        __gcc_except_tab
    0.060    0.008    0.008        __cstring
    0.002    0.000    0.000        __swift5_mpenum
    0.039    0.005    0.005        __swift5_proto
    0.132    0.020    0.019        __unwind_info
    0.004    0.001    0.001        __swift5_protos
    0.038    0.005    0.005        __swift5_reflstr
    0.075    0.011    0.010        __swift5_typeref
    0.229    0.048    0.039        __eh_frame
    0.047    0.008    0.006        __swift5_types
    0.000    0.000    0.000        __ustring
    0.000    0.000    0.000        __literal4
    0.001    0.000    0.000        __literal8
    0.009    0.001    0.002        __objc_classname
    0.039    0.011    0.009        __objc_methname
    0.001    0.000    0.000        __entitlements
    0.000    0.000    0.000        __literal16
    0.185    0.027    0.026      __DATA_CONST
    0.007    0.002    0.001        __got
    0.004    0.001    0.001        __mod_init_func
    0.006    0.001    0.001        __objc_catlist
    0.127    0.020    0.018        __cfstring
    0.025    0.005    0.003        __objc_classlist
    0.178    0.026    0.025        __const
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.008    0.001    0.001        __objc_protolist
    0.476    0.079    0.072      __DATA
    0.000    0.000    0.000        __la_symbol_ptr
    0.000    0.000    0.000        __thread_ptrs
    0.116    0.018    0.016        __data
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __thread_vars
    0.000    0.000    0.000        __thread_data
    0.001    0.000    0.000        __objc_arraydata
    0.000    0.000    0.000        __objc_arrayobj
    0.041    0.005    0.006        __objc_classrefs
    0.254    0.039    0.036        __objc_const
    0.000    0.000    0.000        __objc_dictobj
    0.000    0.000    0.000        __objc_doubleobj
    0.000    0.000    0.000        __objc_floatobj
    0.001    0.000    0.000        __objc_intobj
    0.008    0.001    0.001        __objc_ivar
    0.004    0.001    0.001        __objc_protorefs
    0.064    0.017    0.014        __objc_data
    0.021    0.003    0.004        __objc_selrefs
    0.000    0.000    0.000        __thread_bss
    0.000    0.000    0.000        __objc_stublist
    0.046    0.013    0.010        __objc_superrefs
    0.482    0.081    0.073      __LLVM
    0.482    0.081    0.073        __bitcode
    0.011    0.003    0.002        __cmdline
    0.001    0.000    0.000        __swift_cmdline
    0.029    0.004    0.004        __swift_modhash
    0.242    0.034    0.034        __bundle
    0.442    0.069    0.064      __LINKEDIT
    0.001    0.001    0.000        __rebase
    0.004    0.001    0.001        __binding
    0.000    0.000    0.000        __data_in_code
    0.437    0.067    0.063        __symbol_table
    0.001    0.000    0.000        __lazy_binding
    0.265    0.038    0.037        __export
    0.004    0.001    0.001        __ind_sym_tab
    0.000    0.000    0.000        __string_table
    0.000    0.000    0.000        __code_signature
    0.001    0.000    0.000        __func_starts
    0.359    0.173    0.991    write_signature
    0.000    0.028    0.028      msync
    0.000    0.003    0.003    close_file

So for assign_offsets the bottlenecks are __text and __unwind_info. No, I'm not setting -thread_count 1 anywhere. Initially I thought that maybe under Bazel, there's some kind of limitation to how many cores an action can use. But when I try to run the same action the first time to trace it, I can see it that it's definitely slower (5s in total for example, but varies) than a 3rd run that looks like the first snippet above.

rui314 commented 1 year ago

I wonder if there's anything special about the output directory in your environment. Especially the perf number for write_signature is very odd. Can you make sure that the output directory is a regular locally-mounted directory?

One thing you can try is to specify /dev/null as an output file. Since /dev/null cannot be mmap'ed, sold writes an output file to a memory buffer and then call write(2) to write the buffer to /dev/null. In this situation, we should be able to observe the real performance of write_signature. If write_signature is slow even wtih a memory buffer, there's something wrong with it. Otherwise, it is likely that there's something wrong with the file or the filesystem.

Also one question for my personal interest: I understand that the cached scenario makes sense to test, but I was wondering how does that affect real-world performance. Every incremental build takes about 10s or more to link, while of course rerunning the same command by not changing anything in the codebase takes 2-3x times less than that.

mold/sold aims to increase developer productivity especially in rapid debug-edit-rebuild cycles, so I think assuming that most of its input files are cached is reasonable. It is odd if every incremental build takes about 10s or more to link because it implies that all input files are fresh and new on each incremental build. If you build a project, edit a single source file, compile and re-link it, the second link is faster than the first link, no?

BalestraPatrick commented 1 year ago

By setting -o /dev/null I can see this:

    0.287    0.006    0.038    write_signature
rui314 commented 1 year ago

Okay, so it's slow only when it's reading from an mmap'ed file. Random thoughts:

rui314 commented 1 year ago

I'd also try the following patch. msync doesn't take too long on my machine, but it might not be the case on your machine.

diff --git a/macho/output-chunks.cc b/macho/output-chunks.cc
index 46c50e1e..95aa44d1 100644
--- a/macho/output-chunks.cc
+++ b/macho/output-chunks.cc
@@ -1430,20 +1430,14 @@ void CodeSignatureSection<E>::write_signature(Context<E> &ctx) {
     u8 *start = ctx.buf + i * E::page_size;
     u8 *end = ctx.buf + std::min<i64>((i + 1) * E::page_size, this->hdr.offset);
     sha256_hash(start, end - start, buf + i * SHA256_SIZE);
   };

   for (i64 i = 0; i < num_blocks; i += 1024) {
     i64 j = std::min(num_blocks, i + 1024);
-
-#if __APPLE__
-    // Calling msync() with MS_ASYNC speeds up the following msync()
-    // with MS_INVALIDATE.
-    msync(ctx.buf + i * E::page_size, 1024 * E::page_size, MS_ASYNC);
-#endif
   }

   // A LC_UUID load command may also contain a crypto hash of the
   // entire file. We compute its value as a tree hash.
   if (ctx.arg.uuid == UUID_HASH) {
     u8 uuid[SHA256_SIZE];
     sha256_hash(ctx.buf + this->hdr.offset, this->hdr.size, uuid);
brentleyjones commented 1 year ago

an antivirus program makes file IO very slow as it scans newly created files.

A lot of these large companies have endpoint security software that does exactly that: https://brentley.dev/corporate-crapware/

rui314 commented 1 year ago

an antivirus program makes file IO very slow as it scans newly created files.

A lot of these large companies have endpoint security software that does exactly that: https://brentley.dev/corporate-crapware/

If that's the case, does applying the following patch changes the performance characteristics? This patch makes sold to use write(2) instead of mmap(2) to write to an output file.

diff --git a/common/output-file-unix.h b/common/output-file-unix.h
index 92af144f..5de64fd1 100644
--- a/common/output-file-unix.h
+++ b/common/output-file-unix.h
@@ -133,20 +133,7 @@ OutputFile<Context>::open(Context &ctx, std::string path, i64 filesize, i64 perm
   if (path.starts_with('/') && !ctx.arg.chroot.empty())
     path = ctx.arg.chroot + "/" + path_clean(path);

-  bool is_special = false;
-  if (path == "-") {
-    is_special = true;
-  } else {
-    struct stat st;
-    if (stat(path.c_str(), &st) == 0 && (st.st_mode & S_IFMT) != S_IFREG)
-      is_special = true;
-  }
-
-  OutputFile<Context> *file;
-  if (is_special)
-    file = new MallocOutputFile(ctx, path, filesize, perm);
-  else
-    file = new MemoryMappedOutputFile(ctx, path, filesize, perm);
+  OutputFile<Context> *file = new MemoryMappedOutputFile(ctx, path, filesize, perm);

 #ifdef MADV_HUGEPAGE
   // Enable transparent huge page for an output memory-mapped file.
BalestraPatrick commented 1 year ago

I disabled the system extensions that could interfere with the performance and applied your patch (and also without it), and still about the same performance in both cases:

     User   System     Real  Name
    6.933    2.919    7.310  all
    2.835    2.000    4.189    read_input_files
    0.192    0.063    0.084    resolve_symbols
    0.042    0.042    0.011    create_internal_file
    0.000    0.000    0.000    handle_exported_symbols_list
    0.000    0.000    0.000    handle_unexported_symbols_list
    0.069    0.003    0.025    claim_unresolved_symbols
    0.085    0.001    0.011    remove_unreferenced_subsections
    0.044    0.009    0.065    create_synthetic_chunks
    0.339    0.038    0.079    merge_mergeable_sections
    0.063    0.004    0.010      uniquify_literals __objc_methname
    0.092    0.008    0.027      uniquify_literals __cstring
    0.003    0.001    0.001      uniquify_literals __literal8
    0.006    0.001    0.001      uniquify_literals __objc_classname
    0.012    0.001    0.002      uniquify_literals __objc_methtype
    0.000    0.000    0.000      uniquify_literals __oslogstring
    0.001    0.000    0.000      uniquify_literals __literal16
    0.000    0.000    0.000      uniquify_literals __literal4
    0.009    0.001    0.010      uniquify_literal_pointers
    0.978    0.008    0.111    scan_relocations
    1.361    0.127    0.331    assign_offsets
    0.312    0.071    0.172      __TEXT
    0.000    0.000    0.029        __mach_header
    0.000    0.000    0.000        __stubs
    0.156    0.065    0.040        __text
    0.000    0.000    0.000        __stub_helper
    0.000    0.000    0.000        __gcc_except_tab
    0.001    0.000    0.001        __cstring
    0.151    0.005    0.098        __unwind_info
    0.001    0.000    0.001        __const
    0.000    0.000    0.000        __eh_frame
    0.000    0.000    0.000        __entitlements
    0.000    0.000    0.000        __literal16
    0.000    0.000    0.000        __literal4
    0.000    0.000    0.000        __literal8
    0.000    0.000    0.000        __objc_classname
    0.001    0.000    0.001        __objc_methname
    0.000    0.000    0.000        __objc_methtype
    0.000    0.000    0.000        __objc_stubs
    0.000    0.000    0.000        __oslogstring
    0.000    0.000    0.000        __swift5_assocty
    0.000    0.000    0.000        __swift5_builtin
    0.000    0.000    0.000        __swift5_capture
    0.000    0.000    0.000        __swift5_entry
    0.000    0.000    0.000        __swift5_fieldmd
    0.000    0.000    0.000        __swift5_mpenum
    0.000    0.000    0.000        __swift5_proto
    0.000    0.000    0.000        __swift5_protos
    0.000    0.000    0.000        __swift5_reflstr
    0.000    0.000    0.000        __swift5_typeref
    0.000    0.000    0.000        __swift5_types
    0.000    0.000    0.000        __ustring
    0.004    0.000    0.004      __DATA_CONST
    0.000    0.000    0.000        __got
    0.002    0.000    0.002        __const
    0.000    0.000    0.000        __mod_init_func
    0.002    0.000    0.002        __cfstring
    0.000    0.000    0.000        __objc_catlist
    0.000    0.000    0.000        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.000    0.000    0.000        __objc_protolist
    0.003    0.000    0.003      __DATA
    0.000    0.000    0.000        __la_symbol_ptr
    0.000    0.000    0.000        __thread_ptrs
    0.001    0.000    0.001        __data
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __thread_vars
    0.000    0.000    0.000        __thread_data
    0.000    0.000    0.000        __objc_arraydata
    0.000    0.000    0.000        __objc_arrayobj
    0.000    0.000    0.000        __objc_classrefs
    0.001    0.000    0.001        __objc_const
    0.000    0.000    0.000        __objc_data
    0.000    0.000    0.000        __objc_dictobj
    0.000    0.000    0.000        __objc_doubleobj
    0.000    0.000    0.000        __objc_floatobj
    0.000    0.000    0.000        __objc_intobj
    0.000    0.000    0.000        __objc_ivar
    0.000    0.000    0.000        __objc_protorefs
    0.000    0.000    0.000        __objc_selrefs
    0.000    0.000    0.000        __objc_stublist
    0.000    0.000    0.000        __objc_superrefs
    0.000    0.000    0.000      __LLVM
    0.000    0.000    0.000        __bitcode
    0.000    0.000    0.000        __bundle
    0.000    0.000    0.000        __cmdline
    0.000    0.000    0.000        __swift_cmdline
    0.000    0.000    0.000        __swift_modhash
    1.042    0.057    0.152      __LINKEDIT
    0.848    0.041    0.104        __rebase
    0.006    0.001    0.001        __lazy_binding
    0.132    0.007    0.017        __data_in_code
    0.000    0.000    0.000        __ind_sym_tab
    1.042    0.056    0.151        __binding
    0.560    0.034    0.070        __func_starts
    1.034    0.055    0.142        __export
    0.603    0.035    0.076        __symbol_table
    0.000    0.000    0.000        __string_table
    0.000    0.000    0.000        __code_signature
    0.010    0.102    0.133    open_file
    0.519    0.145    0.164    copy_sections_to_output_file
    0.519    0.145    0.164      __TEXT
    0.000    0.000    0.000        __mach_header
    0.000    0.000    0.000        __stubs
    0.518    0.145    0.164        __text
    0.007    0.001    0.001        __objc_methtype
    0.067    0.005    0.013        __swift5_fieldmd
    0.004    0.001    0.001        __objc_stubs
    0.000    0.000    0.000        __oslogstring
    0.014    0.001    0.003        __swift5_assocty
    0.002    0.000    0.000        __swift5_builtin
    0.019    0.001    0.003        __swift5_capture
    0.000    0.000    0.000        __swift5_entry
    0.001    0.000    0.000        __swift5_mpenum
    0.024    0.001    0.005        __swift5_proto
    0.004    0.000    0.001        __swift5_protos
    0.027    0.002    0.005        __swift5_reflstr
    0.036    0.002    0.009        __swift5_typeref
    0.085    0.013    0.023        __const
    0.021    0.002    0.006        __swift5_types
    0.000    0.000    0.000        __ustring
    0.109    0.037    0.034        __eh_frame
    0.000    0.000    0.000        __stub_helper
    0.014    0.007    0.005        __gcc_except_tab
    0.023    0.004    0.008        __cstring
    0.054    0.018    0.017        __unwind_info
    0.000    0.000    0.000        __literal4
    0.001    0.000    0.000        __literal8
    0.006    0.002    0.002        __objc_classname
    0.018    0.009    0.007        __objc_methname
    0.000    0.000    0.000        __entitlements
    0.000    0.000    0.000        __literal16
    0.242    0.031    0.055      __DATA
    0.000    0.000    0.000        __la_symbol_ptr
    0.000    0.000    0.000        __thread_ptrs
    0.020    0.004    0.004        __objc_selrefs
    0.242    0.031    0.055        __data
    0.000    0.000    0.000        __objc_dictobj
    0.000    0.000    0.000        __thread_data
    0.001    0.000    0.000        __objc_arraydata
    0.000    0.000    0.000        __thread_bss
    0.000    0.000    0.000        __objc_doubleobj
    0.000    0.000    0.000        __objc_floatobj
    0.001    0.000    0.000        __objc_intobj
    0.000    0.000    0.000        __objc_arrayobj
    0.007    0.001    0.002        __objc_ivar
    0.024    0.004    0.005        __objc_classrefs
    0.000    0.000    0.000        __objc_imageinfo
    0.000    0.000    0.000        __thread_vars
    0.004    0.001    0.001        __objc_protorefs
    0.000    0.000    0.000        __objc_stublist
    0.005    0.000    0.001        __objc_superrefs
    0.111    0.006    0.022        __objc_const
    0.067    0.003    0.013        __objc_data
    0.459    0.122    0.134      __LLVM
    0.459    0.122    0.134        __bitcode
    0.190    0.013    0.043        __cmdline
    0.395    0.113    0.121        __bundle
    0.001    0.000    0.000        __swift_cmdline
    0.009    0.003    0.002        __swift_modhash
    0.491    0.131    0.141      __LINKEDIT
    0.003    0.000    0.001        __rebase
    0.003    0.001    0.001        __binding
    0.000    0.000    0.000        __lazy_binding
    0.159    0.010    0.033        __export
    0.001    0.000    0.000        __func_starts
    0.000    0.000    0.000        __data_in_code
    0.325    0.120    0.107        __symbol_table
    0.002    0.001    0.000        __ind_sym_tab
    0.000    0.000    0.000        __string_table
    0.000    0.000    0.000        __code_signature
    0.240    0.029    0.054      __DATA_CONST
    0.004    0.001    0.001        __got
    0.153    0.010    0.031        __const
    0.001    0.000    0.000        __mod_init_func
    0.061    0.011    0.017        __cfstring
    0.003    0.001    0.001        __objc_catlist
    0.013    0.005    0.003        __objc_classlist
    0.000    0.000    0.000        __objc_nlcatlist
    0.000    0.000    0.000        __objc_nlclslist
    0.004    0.002    0.001        __objc_protolist
    0.287    0.371    2.023    write_signature
    0.000    0.029    0.029      msync
    0.000    0.003    0.003    close_file
BalestraPatrick commented 1 year ago

@rui314 What's the easiest way for me to count the number of files that are part of the read_input_files step?

rui314 commented 1 year ago

Adding -Wl,-stats is the easiest way to know the number of input files.

BalestraPatrick commented 1 year ago

Thanks! That gives me this.

            num_rels=14977191
            num_syms=7206589
     num_subsections=3312576
  num_merged_strings=1141406
num_merged_literal_pointers=83573
            num_objs=24927
          num_dylibs=134

This specific link was:

     User   System     Real  Name
    6.591    2.415    5.413  all
    2.687    1.673    3.353    read_input_files
    0.251    0.302    1.190    write_signature