Open pm-dev opened 1 year ago
Can you run the linker with -Wl,-perf
again and paste the output here? I suspect that there's an unnoticed bottleneck in the sold linker. We need to optimize it.
User System Real Name
19.294 1.077 11.063 all
17.191 0.824 10.300 read_input_files
0.246 0.006 0.069 resolve_symbols
0.015 0.004 0.002 create_internal_file
0.000 0.000 0.000 handle_exported_symbols_list
0.000 0.000 0.000 handle_unexported_symbols_list
0.027 0.001 0.014 claim_unresolved_symbols
0.033 0.001 0.004 remove_unreferenced_subsections
0.049 0.002 0.051 create_synthetic_chunks
0.127 0.015 0.029 merge_mergeable_sections
0.003 0.002 0.002 uniquify_literals __literal8
0.022 0.002 0.005 uniquify_literals __objc_methname
0.022 0.002 0.005 uniquify_literals __cstring
0.002 0.001 0.000 uniquify_literals __objc_classname
0.003 0.001 0.001 uniquify_literals __objc_methtype
0.001 0.001 0.001 uniquify_literals __literal16
0.006 0.000 0.006 uniquify_literal_pointers
0.104 0.001 0.064 scan_relocations
0.014 0.001 0.002 scan_unwind_info
0.851 0.053 0.234 assign_offsets
0.188 0.027 0.130 __TEXT
0.000 0.000 0.001 __mach_header
0.000 0.000 0.000 __stubs
0.082 0.021 0.020 __text
0.000 0.000 0.000 __gcc_except_tab
0.000 0.000 0.000 __cstring
0.102 0.005 0.108 __unwind_info
0.000 0.000 0.000 __const
0.000 0.000 0.000 __eh_frame
0.000 0.000 0.000 __init_offsets
0.000 0.000 0.000 __literal16
0.000 0.000 0.000 __literal8
0.000 0.000 0.000 __objc_classname
0.000 0.000 0.000 __objc_methname
0.000 0.000 0.000 __objc_methtype
0.000 0.000 0.000 __objc_stubs
0.000 0.000 0.000 __swift5_assocty
0.000 0.000 0.000 __swift5_builtin
0.000 0.000 0.000 __swift5_capture
0.000 0.000 0.000 __swift5_entry
0.000 0.000 0.000 __swift5_fieldmd
0.000 0.000 0.000 __swift5_mpenum
0.000 0.000 0.000 __swift5_proto
0.000 0.000 0.000 __swift5_protos
0.000 0.000 0.000 __swift5_reflstr
0.000 0.000 0.000 __swift5_typeref
0.000 0.000 0.000 __swift5_types
0.000 0.000 0.000 __ustring
0.001 0.000 0.001 __DATA_CONST
0.000 0.000 0.000 __got
0.000 0.000 0.000 __const
0.000 0.000 0.000 __cfstring
0.000 0.000 0.000 __objc_catlist
0.000 0.000 0.000 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.000 0.000 0.000 __objc_protolist
0.001 0.000 0.001 __DATA
0.000 0.000 0.000 __thread_ptrs
0.000 0.000 0.000 __data
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __objc_classrefs
0.000 0.000 0.000 __objc_const
0.000 0.000 0.000 __objc_data
0.000 0.000 0.000 __objc_ivar
0.000 0.000 0.000 __objc_protorefs
0.000 0.000 0.000 __objc_selrefs
0.000 0.000 0.000 __objc_stublist
0.000 0.000 0.000 __objc_superrefs
0.000 0.000 0.000 __swift51_hooks
0.000 0.000 0.000 __swift_hooks
0.000 0.000 0.000 __LLVM
0.000 0.000 0.000 __bitcode
0.000 0.000 0.000 __cmdline
0.000 0.000 0.000 __swift_modhash
0.662 0.026 0.102 __LINKEDIT
0.662 0.026 0.102 __chainfixups
0.017 0.001 0.003 __data_in_code
0.645 0.026 0.084 __export
0.000 0.000 0.000 __ind_sym_tab
0.288 0.011 0.036 __func_starts
0.254 0.009 0.028 __symbol_table
0.000 0.000 0.000 __string_table
0.003 0.042 0.051 open_file
0.483 0.112 0.127 copy_sections_to_output_file
0.483 0.111 0.127 __TEXT
0.006 0.001 0.001 __mach_header
0.002 0.001 0.000 __objc_methtype
0.001 0.001 0.000 __swift5_mpenum
0.013 0.002 0.002 __gcc_except_tab
0.103 0.013 0.014 __const
0.026 0.002 0.003 __swift5_proto
0.007 0.001 0.001 __objc_stubs
0.005 0.000 0.001 __stubs
0.000 0.000 0.000 __literal16
0.002 0.000 0.000 __literal8
0.454 0.109 0.103 __text
0.002 0.000 0.000 __objc_classname
0.009 0.001 0.001 __swift5_assocty
0.028 0.002 0.004 __objc_methname
0.033 0.003 0.004 __cstring
0.000 0.000 0.000 __swift5_builtin
0.011 0.001 0.001 __swift5_capture
0.005 0.001 0.001 __swift5_protos
0.000 0.000 0.000 __swift5_entry
0.042 0.005 0.006 __swift5_fieldmd
0.021 0.002 0.003 __swift5_reflstr
0.437 0.106 0.120 __unwind_info
0.038 0.005 0.005 __swift5_typeref
0.019 0.002 0.003 __swift5_types
0.000 0.000 0.000 __ustring
0.121 0.018 0.020 __eh_frame
0.001 0.000 0.000 __init_offsets
0.082 0.010 0.011 __DATA_CONST
0.007 0.001 0.001 __got
0.017 0.002 0.002 __objc_classlist
0.000 0.001 0.000 __objc_nlclslist
0.002 0.000 0.000 __objc_protolist
0.000 0.000 0.000 __objc_nlcatlist
0.075 0.009 0.010 __const
0.009 0.001 0.001 __cfstring
0.002 0.000 0.000 __objc_catlist
0.379 0.081 0.075 __DATA
0.000 0.000 0.000 __thread_ptrs
0.050 0.005 0.007 __data
0.001 0.000 0.000 __objc_protorefs
0.023 0.002 0.003 __objc_selrefs
0.000 0.000 0.000 __objc_stublist
0.003 0.000 0.000 __objc_superrefs
0.000 0.000 0.000 __swift51_hooks
0.000 0.000 0.000 __swift_hooks
0.014 0.002 0.002 __objc_classrefs
0.051 0.007 0.007 __objc_data
0.000 0.000 0.000 __objc_imageinfo
0.075 0.010 0.010 __objc_const
0.003 0.000 0.000 __objc_ivar
0.392 0.085 0.079 __LLVM
0.391 0.085 0.079 __bitcode
0.003 0.001 0.001 __cmdline
0.011 0.001 0.002 __swift_modhash
0.328 0.075 0.068 __LINKEDIT
0.004 0.001 0.001 __chainfixups
0.115 0.016 0.017 __export
0.000 0.000 0.000 __data_in_code
0.296 0.070 0.064 __symbol_table
0.003 0.000 0.000 __ind_sym_tab
0.000 0.000 0.000 __string_table
0.001 0.000 0.000 __func_starts
0.009 0.000 0.009 write_fixup_chains
0.075 0.000 0.010 copy_sections_to_output_file
0.000 0.014 0.056 close_file
Thanks. So the read_input_files
pass dominate the entire execution time. It's not expected -- on my M2 Mac, sold can read ~2500 files in ~0.2 seconds, so even if you have 6.5k input files, it shouldn't take that long.
I want to make sure that you have built sold with the default cmake
options. Specifically you haven't built it with -DCMAKE_BUILD_TYPE=Debug
.
I wonder if your project is open-source. If so, please let me know the location of the repository so that I can build it myself to reproduce the issue.
If your project is not open-source, can you rebuild sold with the following patch and run it again with -Wl,-perf,-thread_count,1
to see if there's a file that's particularly slow to read?
diff --git a/macho/input-files.cc b/macho/input-files.cc
index 44e5da2a..79e58a89 100644
--- a/macho/input-files.cc
+++ b/macho/input-files.cc
@@ -35,6 +35,7 @@ template <typename E>
ObjectFile<E> *
ObjectFile<E>::create(Context<E> &ctx, MappedFile<Context<E>> *mf,
std::string archive_name) {
+ Timer t(ctx, "ObjectFile::create " + std::string(mf->name));
ObjectFile<E> *obj = new ObjectFile<E>(mf);
obj->archive_name = archive_name;
obj->is_alive = archive_name.empty() || ctx.all_load;
@@ -1199,6 +1200,7 @@ DylibFile<E>::DylibFile(Context<E> &ctx, MappedFile<Context<E>> *mf)
template <typename E>
DylibFile<E> *DylibFile<E>::create(Context<E> &ctx, MappedFile<Context<E>> *mf) {
+ Timer t(ctx, "DylibFile::create " + std::string(mf->name));
DylibFile<E> *file = new DylibFile<E>(ctx, mf);
ctx.dylib_pool.emplace_back(file);
return file;
diff --git a/macho/macho-main.cc b/macho/macho-main.cc
index e3678787..c7b9c910 100644
--- a/macho/macho-main.cc
+++ b/macho/macho-main.cc
@@ -883,6 +883,8 @@ strip_universal_header(Context<E> &ctx, MappedFile<Context<E>> *mf) {
template <typename E>
static void read_file(Context<E> &ctx, MappedFile<Context<E>> *mf) {
+ Timer t(ctx, std::string(mf->name));
+
if (get_file_type(ctx, mf) == FileType::MACH_UNIVERSAL)
mf = strip_universal_header(ctx, mf);
The issue is I had the address sanitizer turned on. Had seen this comment and included the MOLD_USE_ASAN
flag.
Now the link time is consistently 1.4 seconds which is same as the default linker. Here's the sold breakdown with -Wl,-perf
User System Real Name
2.393 0.649 1.360 all
1.739 0.510 1.123 read_input_files
0.071 0.015 0.020 resolve_symbols
0.007 0.004 0.001 create_internal_file
0.000 0.000 0.000 handle_exported_symbols_list
0.000 0.000 0.000 handle_unexported_symbols_list
0.008 0.000 0.003 claim_unresolved_symbols
0.016 0.000 0.002 remove_unreferenced_subsections
0.006 0.001 0.007 create_synthetic_chunks
0.065 0.019 0.013 merge_mergeable_sections
0.001 0.001 0.001 uniquify_literals __literal8
0.011 0.001 0.002 uniquify_literals __objc_methname
0.010 0.002 0.002 uniquify_literals __cstring
0.001 0.001 0.000 uniquify_literals __objc_classname
0.001 0.001 0.000 uniquify_literals __objc_methtype
0.000 0.000 0.000 uniquify_literals __literal16
0.002 0.000 0.002 uniquify_literal_pointers
0.040 0.001 0.025 scan_relocations
0.005 0.000 0.001 scan_unwind_info
0.192 0.019 0.048 assign_offsets
0.045 0.007 0.025 __TEXT
0.000 0.000 0.000 __mach_header
0.000 0.000 0.000 __stubs
0.030 0.004 0.007 __text
0.000 0.000 0.000 __gcc_except_tab
0.000 0.000 0.000 __cstring
0.014 0.003 0.016 __unwind_info
0.000 0.000 0.000 __const
0.000 0.000 0.000 __eh_frame
0.000 0.000 0.000 __init_offsets
0.000 0.000 0.000 __literal16
0.000 0.000 0.000 __literal8
0.000 0.000 0.000 __objc_classname
0.000 0.000 0.000 __objc_methname
0.000 0.000 0.000 __objc_methtype
0.000 0.000 0.000 __objc_stubs
0.000 0.000 0.000 __swift5_assocty
0.000 0.000 0.000 __swift5_builtin
0.000 0.000 0.000 __swift5_capture
0.000 0.000 0.000 __swift5_entry
0.000 0.000 0.000 __swift5_fieldmd
0.000 0.000 0.000 __swift5_mpenum
0.000 0.000 0.000 __swift5_proto
0.000 0.000 0.000 __swift5_protos
0.000 0.000 0.000 __swift5_reflstr
0.000 0.000 0.000 __swift5_typeref
0.000 0.000 0.000 __swift5_types
0.000 0.000 0.000 __ustring
0.000 0.000 0.000 __DATA_CONST
0.000 0.000 0.000 __got
0.000 0.000 0.000 __const
0.000 0.000 0.000 __cfstring
0.000 0.000 0.000 __objc_catlist
0.000 0.000 0.000 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.000 0.000 0.000 __objc_protolist
0.000 0.000 0.000 __DATA
0.000 0.000 0.000 __thread_ptrs
0.000 0.000 0.000 __data
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __objc_classrefs
0.000 0.000 0.000 __objc_const
0.000 0.000 0.000 __objc_data
0.000 0.000 0.000 __objc_ivar
0.000 0.000 0.000 __objc_protorefs
0.000 0.000 0.000 __objc_selrefs
0.000 0.000 0.000 __objc_stublist
0.000 0.000 0.000 __objc_superrefs
0.000 0.000 0.000 __swift51_hooks
0.000 0.000 0.000 __swift_hooks
0.000 0.000 0.000 __LLVM
0.000 0.000 0.000 __bitcode
0.000 0.000 0.000 __cmdline
0.000 0.000 0.000 __swift_modhash
0.146 0.011 0.022 __LINKEDIT
0.134 0.010 0.018 __chainfixups
0.036 0.002 0.004 __data_in_code
0.000 0.000 0.000 __ind_sym_tab
0.146 0.011 0.022 __export
0.095 0.006 0.012 __symbol_table
0.079 0.005 0.010 __func_starts
0.000 0.000 0.000 __string_table
0.003 0.014 0.016 open_file
0.132 0.045 0.037 copy_sections_to_output_file
0.132 0.045 0.037 __TEXT
0.000 0.000 0.000 __mach_header
0.035 0.004 0.005 __const
0.001 0.000 0.000 __objc_methtype
0.000 0.000 0.000 __swift5_mpenum
0.002 0.001 0.000 __gcc_except_tab
0.001 0.000 0.000 __literal16
0.001 0.000 0.000 __stubs
0.010 0.001 0.001 __swift5_proto
0.000 0.000 0.000 __objc_stubs
0.131 0.044 0.036 __text
0.005 0.001 0.001 __swift5_assocty
0.001 0.000 0.000 __literal8
0.013 0.001 0.002 __cstring
0.001 0.000 0.000 __objc_classname
0.008 0.001 0.001 __objc_methname
0.000 0.000 0.000 __swift5_builtin
0.006 0.000 0.001 __swift5_capture
0.002 0.000 0.000 __swift5_protos
0.009 0.001 0.001 __swift5_reflstr
0.000 0.000 0.000 __swift5_entry
0.018 0.002 0.002 __swift5_fieldmd
0.098 0.024 0.020 __unwind_info
0.057 0.010 0.009 __eh_frame
0.016 0.002 0.002 __swift5_typeref
0.008 0.001 0.001 __swift5_types
0.000 0.000 0.000 __ustring
0.000 0.000 0.000 __init_offsets
0.063 0.008 0.009 __DATA
0.000 0.000 0.000 __thread_ptrs
0.024 0.002 0.003 __data
0.000 0.000 0.000 __objc_protorefs
0.007 0.001 0.001 __objc_selrefs
0.000 0.000 0.000 __objc_stublist
0.001 0.000 0.000 __objc_superrefs
0.000 0.000 0.000 __swift51_hooks
0.000 0.000 0.000 __swift_hooks
0.000 0.000 0.000 __objc_imageinfo
0.007 0.001 0.001 __objc_classrefs
0.029 0.004 0.004 __objc_const
0.018 0.002 0.003 __objc_data
0.000 0.000 0.000 __objc_ivar
0.116 0.029 0.024 __LLVM
0.116 0.029 0.024 __bitcode
0.001 0.000 0.000 __cmdline
0.003 0.000 0.000 __swift_modhash
0.044 0.006 0.006 __DATA_CONST
0.002 0.000 0.000 __got
0.031 0.003 0.004 __const
0.004 0.001 0.001 __cfstring
0.001 0.000 0.000 __objc_catlist
0.005 0.001 0.001 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.001 0.000 0.000 __objc_protolist
0.119 0.031 0.026 __LINKEDIT
0.001 0.000 0.000 __chainfixups
0.049 0.006 0.007 __export
0.000 0.000 0.000 __data_in_code
0.086 0.025 0.020 __symbol_table
0.001 0.000 0.000 __func_starts
0.001 0.000 0.000 __ind_sym_tab
0.000 0.000 0.000 __string_table
0.001 0.000 0.001 write_fixup_chains
0.073 0.000 0.009 copy_sections_to_output_file
0.000 0.018 0.038 close_file
Also, unrelated, but I also get hundreds of warnings: "(arm64) failed to insert symbol abc in the debug map" and "(arm) could not find object file symbol for symbol xyz"
What is your cmake config? Please run cmake -N -L .
and paste the output. Here is mine fyi:
$ cmake -N -L .
-- Cache values
BUILD_TESTING:BOOL=ON
CMAKE_BUILD_TYPE:STRING=Release
CMAKE_INSTALL_PREFIX:PATH=/usr/local
CMAKE_OSX_ARCHITECTURES:STRING=
CMAKE_OSX_DEPLOYMENT_TARGET:STRING=13.0
CMAKE_OSX_SYSROOT:PATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk
MOLD_LTO:BOOL=0
MOLD_MOSTLY_STATIC:BOOL=OFF
MOLD_USE_ASAN:BOOL=OFF
MOLD_USE_MOLD:BOOL=0
MOLD_USE_SYSTEM_TBB:BOOL=OFF
MOLD_USE_TSAN:BOOL=OFF
TBB4PY_BUILD:BOOL=OFF
TBBMALLOC_BUILD:BOOL=ON
TBBMALLOC_PROXY_BUILD:BOOL=ON
TBB_BUILD:BOOL=ON
TBB_CPF:BOOL=OFF
TBB_DISABLE_HWLOC_AUTOMATIC_SEARCH:BOOL=OFF
TBB_ENABLE_IPO:BOOL=ON
TBB_EXAMPLES:BOOL=OFF
TBB_FIND_PACKAGE:BOOL=OFF
TBB_INSTALL_VARS:BOOL=OFF
TBB_NO_APPCONTAINER:BOOL=OFF
TBB_SANITIZE:STRING=
TBB_TEST_SPEC:BOOL=OFF
TBB_VALGRIND_MEMCHECK:BOOL=OFF
TBB_WINDOWS_DRIVER:BOOL=OFF
ZSTD_BUILD_CONTRIB:BOOL=OFF
ZSTD_BUILD_PROGRAMS:BOOL=ON
ZSTD_BUILD_SHARED:BOOL=ON
ZSTD_BUILD_STATIC:BOOL=ON
ZSTD_BUILD_TESTS:BOOL=OFF
ZSTD_LEGACY_SUPPORT:BOOL=OFF
ZSTD_LZ4_SUPPORT:BOOL=OFF
ZSTD_LZMA_SUPPORT:BOOL=OFF
ZSTD_MULTITHREAD_SUPPORT:BOOL=ON
ZSTD_PROGRAMS_LINK_SHARED:BOOL=OFF
ZSTD_ZLIB_SUPPORT:BOOL=OFF
Looks to be the same
➜ build git:(main) cmake -N -L .
-- Cache values
BUILD_TESTING:BOOL=ON
CMAKE_BUILD_TYPE:STRING=Release
CMAKE_INSTALL_PREFIX:PATH=/usr/local
CMAKE_OSX_ARCHITECTURES:STRING=
CMAKE_OSX_DEPLOYMENT_TARGET:STRING=
CMAKE_OSX_SYSROOT:PATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk
MOLD_LTO:BOOL=OFF
MOLD_MOSTLY_STATIC:BOOL=OFF
MOLD_USE_ASAN:BOOL=OFF
MOLD_USE_MOLD:BOOL=OFF
MOLD_USE_SYSTEM_TBB:BOOL=OFF
MOLD_USE_TSAN:BOOL=OFF
TBB4PY_BUILD:BOOL=OFF
TBBMALLOC_BUILD:BOOL=ON
TBBMALLOC_PROXY_BUILD:BOOL=ON
TBB_BUILD:BOOL=ON
TBB_CPF:BOOL=OFF
TBB_DISABLE_HWLOC_AUTOMATIC_SEARCH:BOOL=OFF
TBB_ENABLE_IPO:BOOL=ON
TBB_EXAMPLES:BOOL=OFF
TBB_FIND_PACKAGE:BOOL=OFF
TBB_INSTALL_VARS:BOOL=OFF
TBB_NO_APPCONTAINER:BOOL=OFF
TBB_SANITIZE:STRING=
TBB_TEST_SPEC:BOOL=OFF
TBB_VALGRIND_MEMCHECK:BOOL=OFF
TBB_WINDOWS_DRIVER:BOOL=OFF
ZSTD_BUILD_CONTRIB:BOOL=OFF
ZSTD_BUILD_PROGRAMS:BOOL=ON
ZSTD_BUILD_SHARED:BOOL=ON
ZSTD_BUILD_STATIC:BOOL=ON
ZSTD_BUILD_TESTS:BOOL=OFF
ZSTD_LEGACY_SUPPORT:BOOL=OFF
ZSTD_LZ4_SUPPORT:BOOL=OFF
ZSTD_LZMA_SUPPORT:BOOL=OFF
ZSTD_MULTITHREAD_SUPPORT:BOOL=ON
ZSTD_PROGRAMS_LINK_SHARED:BOOL=OFF
ZSTD_ZLIB_SUPPORT:BOOL=OFF
If you apply the patch (https://github.com/bluewhalesystems/sold/issues/16#issuecomment-1429459090) and run with -Wl,-perf,-thread_count,1
, is there any particularly slow file?
All files show 0.000
for Real
except for 7 frameworks which add up to 0.067
. The entire read_input_files
phase adds up to 2.812
. Not sure how to explain that. Happy to send the full output to your email.
I seem to have the same issue on a M1 Max. I applied -Wl,-perf,-thread_count,1
and don't see any particular file that is slow to read (everything is 0.001 or 0.000).
User System Real Name
7.478 2.279 9.187 all
3.412 1.656 7.174 read_input_files
I believe there's a bottleneck that stands out under specific conditions, so we need to figure it out. Can you run the linker under the profiler and share the output file? On macOS, you can run the command under a profiler with the command.
xctrace record --template 'Time Profiler' --launch -- /path/to/sold/ld64 <command line arguments>
A few notes about the profiler:
sold
with -DCMAKE_BUILD_TYPE=RelWithDebInfo
so that the profiler is able to figure out function names, etc.-###
to the compiler driver linker invocation.Thanks for the great instructions. For me it looks like this. After warming up the caches by running the command three times the linking seems to take about 3-4s. I sent the trace over email as well if you want to take a look at that.
Thank you for sharing the profiling data. But it looks like mold
took only 11.00 ms to finish in your profile. Is this a correct sample that you wanted to take?
Oops, yes. I missed the @
to load the params from a file (xctrace record --template 'Time Profiler' --launch -- /path/to/sold/ld64 @file.params
. Here we go.
Thanks for sharing the profiling data! I fixed some obvious bottlenecks in the above commits. There are other bottlenecks in which sold uses only single core, so we can improve it even more. But for now, I believe this is good enough. Could you rebuild sold and try again?
Thanks for the fixes! I don't see the tbd slowdown anymore. I seem to still see the following sections taking quite a bit of time:
User System Real Name
6.963 2.308 10.913 all
3.032 1.656 8.623 read_input_files
0.793 0.004 0.107 scan_relocations
1.305 0.100 0.331 assign_offsets
0.368 0.263 1.355 write_signature
Happy to apply more patches to see where the bottlenecks are because I don't see any tbd parsing now taking a significant amount of time.
Also one question for my personal interest: I understand that the cached scenario makes sense to test, but I was wondering how does that affect real-world performance. Every incremental build takes about 10s or more to link, while of course rerunning the same command by not changing anything in the codebase takes 2-3x times less than that.
The above change should increase parallelism in the read_input_files
stage. What are the numbers with that change?
Do you know which sub-stages in assign_offsets
take a long time?
The perf number of write_signature
is very mysterious. write_signature
computes SHA256 hashes for all pages of the output file that the linker has just created, and it is embarrassingly parallel. Since the file has just be written by the linker itself, it is always in the buffer cache. So it shouldn't take that long. Are you testing this with -thread_count 1
?
read_input_files
stage take anywhere from 3s to 0.6s. This is an example after executing the same command about 5 times.
User System Real Name
6.852 1.112 1.892 all
2.830 0.639 0.661 read_input_files
0.179 0.018 0.074 resolve_symbols
0.043 0.030 0.011 create_internal_file
0.000 0.000 0.000 handle_exported_symbols_list
0.000 0.000 0.000 handle_unexported_symbols_list
0.069 0.004 0.026 claim_unresolved_symbols
0.085 0.001 0.011 remove_unreferenced_subsections
0.046 0.008 0.053 create_synthetic_chunks
0.427 0.021 0.080 merge_mergeable_sections
0.074 0.004 0.014 uniquify_literals __objc_methname
0.166 0.005 0.029 uniquify_literals __cstring
0.003 0.001 0.001 uniquify_literals __literal8
0.006 0.001 0.001 uniquify_literals __objc_classname
0.011 0.001 0.003 uniquify_literals __objc_methtype
0.000 0.000 0.000 uniquify_literals __oslogstring
0.001 0.000 0.000 uniquify_literals __literal16
0.000 0.000 0.000 uniquify_literals __literal4
0.009 0.000 0.010 uniquify_literal_pointers
0.760 0.006 0.099 scan_relocations
1.333 0.121 0.296 assign_offsets
0.300 0.069 0.129 __TEXT
0.000 0.000 0.000 __mach_header
0.000 0.000 0.000 __stubs
0.148 0.064 0.045 __text
0.000 0.000 0.000 __stub_helper
0.000 0.000 0.000 __gcc_except_tab
0.001 0.000 0.001 __cstring
0.147 0.005 0.079 __unwind_info
0.001 0.000 0.001 __const
0.000 0.000 0.000 __eh_frame
0.000 0.000 0.000 __entitlements
0.000 0.000 0.000 __literal16
0.000 0.000 0.000 __literal4
0.000 0.000 0.000 __literal8
0.000 0.000 0.000 __objc_classname
0.001 0.000 0.001 __objc_methname
0.000 0.000 0.000 __objc_methtype
0.000 0.000 0.000 __objc_stubs
0.000 0.000 0.000 __oslogstring
0.000 0.000 0.000 __swift5_assocty
0.000 0.000 0.000 __swift5_builtin
0.000 0.000 0.000 __swift5_capture
0.000 0.000 0.000 __swift5_entry
0.000 0.000 0.000 __swift5_fieldmd
0.000 0.000 0.000 __swift5_mpenum
0.000 0.000 0.000 __swift5_proto
0.000 0.000 0.000 __swift5_protos
0.000 0.000 0.000 __swift5_reflstr
0.000 0.000 0.000 __swift5_typeref
0.000 0.000 0.000 __swift5_types
0.000 0.000 0.000 __ustring
0.004 0.000 0.004 __DATA_CONST
0.000 0.000 0.000 __got
0.002 0.000 0.002 __const
0.000 0.000 0.000 __mod_init_func
0.002 0.000 0.002 __cfstring
0.000 0.000 0.000 __objc_catlist
0.000 0.000 0.000 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.000 0.000 0.000 __objc_protolist
0.003 0.000 0.003 __DATA
0.000 0.000 0.000 __la_symbol_ptr
0.000 0.000 0.000 __thread_ptrs
0.001 0.000 0.001 __data
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __thread_vars
0.000 0.000 0.000 __thread_data
0.000 0.000 0.000 __objc_arraydata
0.000 0.000 0.000 __objc_arrayobj
0.000 0.000 0.000 __objc_classrefs
0.001 0.000 0.001 __objc_const
0.000 0.000 0.000 __objc_data
0.000 0.000 0.000 __objc_dictobj
0.000 0.000 0.000 __objc_doubleobj
0.000 0.000 0.000 __objc_floatobj
0.000 0.000 0.000 __objc_intobj
0.000 0.000 0.000 __objc_ivar
0.000 0.000 0.000 __objc_protorefs
0.000 0.000 0.000 __objc_selrefs
0.000 0.000 0.000 __objc_stublist
0.000 0.000 0.000 __objc_superrefs
0.000 0.000 0.000 __LLVM
0.000 0.000 0.000 __bitcode
0.000 0.000 0.000 __bundle
0.000 0.000 0.000 __cmdline
0.000 0.000 0.000 __swift_cmdline
0.000 0.000 0.000 __swift_modhash
1.026 0.052 0.159 __LINKEDIT
0.700 0.035 0.088 __rebase
0.157 0.007 0.019 __data_in_code
0.006 0.001 0.001 __lazy_binding
1.026 0.052 0.159 __binding
0.000 0.000 0.000 __ind_sym_tab
0.606 0.031 0.073 __symbol_table
1.006 0.050 0.138 __export
0.623 0.031 0.075 __func_starts
0.000 0.000 0.000 __string_table
0.000 0.000 0.000 __code_signature
0.010 0.072 0.083 open_file
0.618 0.080 0.079 copy_sections_to_output_file
0.618 0.080 0.079 __TEXT
0.000 0.000 0.000 __mach_header
0.010 0.002 0.001 __objc_methtype
0.099 0.021 0.013 __swift5_fieldmd
0.000 0.000 0.000 __stubs
0.617 0.079 0.079 __text
0.020 0.004 0.003 __swift5_assocty
0.010 0.002 0.001 __objc_stubs
0.028 0.005 0.004 __swift5_capture
0.003 0.000 0.000 __swift5_builtin
0.000 0.000 0.000 __oslogstring
0.039 0.010 0.005 __swift5_reflstr
0.000 0.000 0.000 __swift5_entry
0.041 0.011 0.006 __swift5_proto
0.081 0.009 0.010 __swift5_typeref
0.057 0.004 0.007 __swift5_types
0.006 0.000 0.001 __swift5_protos
0.002 0.000 0.000 __swift5_mpenum
0.231 0.023 0.029 __const
0.000 0.000 0.000 __literal4
0.002 0.000 0.000 __literal8
0.006 0.000 0.001 __ustring
0.012 0.001 0.002 __objc_classname
0.000 0.000 0.000 __entitlements
0.001 0.000 0.000 __literal16
0.061 0.007 0.007 __objc_methname
0.000 0.000 0.000 __stub_helper
0.018 0.002 0.002 __gcc_except_tab
0.070 0.007 0.009 __cstring
0.152 0.017 0.019 __unwind_info
0.282 0.035 0.036 __eh_frame
0.444 0.056 0.056 __DATA_CONST
0.008 0.002 0.001 __got
0.018 0.004 0.002 __objc_catlist
0.436 0.054 0.055 __const
0.119 0.020 0.015 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.008 0.001 0.001 __objc_protolist
0.003 0.000 0.000 __mod_init_func
0.154 0.014 0.019 __cfstring
0.546 0.069 0.069 __DATA
0.000 0.000 0.000 __la_symbol_ptr
0.000 0.000 0.000 __thread_ptrs
0.111 0.022 0.015 __data
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __thread_vars
0.000 0.000 0.000 __thread_data
0.001 0.000 0.000 __objc_arraydata
0.001 0.000 0.000 __objc_arrayobj
0.042 0.003 0.005 __objc_classrefs
0.001 0.000 0.000 __objc_dictobj
0.000 0.000 0.000 __objc_doubleobj
0.000 0.000 0.000 __objc_floatobj
0.001 0.000 0.000 __objc_intobj
0.029 0.003 0.004 __objc_ivar
0.323 0.035 0.040 __objc_const
0.005 0.001 0.001 __objc_protorefs
0.104 0.010 0.013 __objc_selrefs
0.000 0.000 0.000 __thread_bss
0.000 0.000 0.000 __objc_stublist
0.010 0.001 0.001 __objc_superrefs
0.068 0.009 0.009 __objc_data
0.082 0.019 0.011 __LLVM
0.082 0.019 0.011 __bitcode
0.002 0.000 0.000 __cmdline
0.000 0.000 0.000 __swift_cmdline
0.014 0.003 0.002 __swift_modhash
0.036 0.010 0.005 __bundle
0.600 0.077 0.076 __LINKEDIT
0.006 0.001 0.001 __rebase
0.045 0.010 0.006 __binding
0.002 0.001 0.000 __lazy_binding
0.374 0.042 0.047 __export
0.000 0.000 0.000 __data_in_code
0.529 0.060 0.067 __symbol_table
0.003 0.000 0.000 __func_starts
0.004 0.000 0.000 __ind_sym_tab
0.000 0.000 0.000 __string_table
0.000 0.000 0.000 __code_signature
0.284 0.107 0.329 write_signature
0.000 0.027 0.027 msync
0.000 0.003 0.003 close_file
But again, when I run a build with Bazel (without sandbox of course), I get much worse numbers:
User System Real Name
6.979 2.182 9.100 all
2.854 1.596 7.238 read_input_files
0.185 0.049 0.076 resolve_symbols
0.041 0.039 0.010 create_internal_file
0.000 0.000 0.000 handle_exported_symbols_list
0.000 0.000 0.000 handle_unexported_symbols_list
0.072 0.004 0.027 claim_unresolved_symbols
0.090 0.001 0.012 remove_unreferenced_subsections
0.043 0.006 0.051 create_synthetic_chunks
0.329 0.033 0.058 merge_mergeable_sections
0.060 0.005 0.010 uniquify_literals __objc_methname
0.077 0.005 0.014 uniquify_literals __cstring
0.003 0.001 0.002 uniquify_literals __literal8
0.006 0.001 0.001 uniquify_literals __objc_classname
0.011 0.001 0.002 uniquify_literals __objc_methtype
0.000 0.000 0.000 uniquify_literals __oslogstring
0.001 0.000 0.000 uniquify_literals __literal16
0.000 0.000 0.000 uniquify_literals __literal4
0.009 0.001 0.010 uniquify_literal_pointers
0.922 0.004 0.106 scan_relocations
1.326 0.108 0.290 assign_offsets
0.301 0.058 0.130 __TEXT
0.000 0.000 0.000 __mach_header
0.000 0.000 0.000 __stubs
0.143 0.054 0.046 __text
0.000 0.000 0.000 __stub_helper
0.000 0.000 0.000 __gcc_except_tab
0.001 0.000 0.001 __cstring
0.153 0.004 0.079 __unwind_info
0.001 0.000 0.001 __const
0.000 0.000 0.000 __eh_frame
0.000 0.000 0.000 __entitlements
0.000 0.000 0.000 __literal16
0.000 0.000 0.000 __literal4
0.000 0.000 0.000 __literal8
0.000 0.000 0.000 __objc_classname
0.001 0.000 0.001 __objc_methname
0.000 0.000 0.000 __objc_methtype
0.000 0.000 0.000 __objc_stubs
0.000 0.000 0.000 __oslogstring
0.000 0.000 0.000 __swift5_assocty
0.000 0.000 0.000 __swift5_builtin
0.000 0.000 0.000 __swift5_capture
0.000 0.000 0.000 __swift5_entry
0.000 0.000 0.000 __swift5_fieldmd
0.000 0.000 0.000 __swift5_mpenum
0.000 0.000 0.000 __swift5_proto
0.000 0.000 0.000 __swift5_protos
0.000 0.000 0.000 __swift5_reflstr
0.000 0.000 0.000 __swift5_typeref
0.000 0.000 0.000 __swift5_types
0.000 0.000 0.000 __ustring
0.004 0.000 0.004 __DATA_CONST
0.000 0.000 0.000 __got
0.002 0.000 0.002 __const
0.000 0.000 0.000 __mod_init_func
0.002 0.000 0.002 __cfstring
0.000 0.000 0.000 __objc_catlist
0.000 0.000 0.000 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.000 0.000 0.000 __objc_protolist
0.003 0.000 0.003 __DATA
0.000 0.000 0.000 __la_symbol_ptr
0.000 0.000 0.000 __thread_ptrs
0.001 0.000 0.001 __data
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __thread_vars
0.000 0.000 0.000 __thread_data
0.000 0.000 0.000 __objc_arraydata
0.000 0.000 0.000 __objc_arrayobj
0.000 0.000 0.000 __objc_classrefs
0.001 0.000 0.001 __objc_const
0.000 0.000 0.000 __objc_data
0.000 0.000 0.000 __objc_dictobj
0.000 0.000 0.000 __objc_doubleobj
0.000 0.000 0.000 __objc_floatobj
0.000 0.000 0.000 __objc_intobj
0.000 0.000 0.000 __objc_ivar
0.000 0.000 0.000 __objc_protorefs
0.000 0.000 0.000 __objc_selrefs
0.000 0.000 0.000 __objc_stublist
0.000 0.000 0.000 __objc_superrefs
0.000 0.000 0.000 __LLVM
0.000 0.000 0.000 __bitcode
0.000 0.000 0.000 __bundle
0.000 0.000 0.000 __cmdline
0.000 0.000 0.000 __swift_cmdline
0.000 0.000 0.000 __swift_modhash
1.018 0.050 0.152 __LINKEDIT
0.858 0.038 0.103 __rebase
0.165 0.008 0.020 __data_in_code
0.007 0.001 0.001 __lazy_binding
1.018 0.050 0.152 __binding
1.002 0.048 0.135 __export
0.631 0.032 0.077 __func_starts
0.000 0.000 0.000 __ind_sym_tab
0.580 0.029 0.069 __symbol_table
0.000 0.000 0.000 __string_table
0.000 0.000 0.000 __code_signature
0.009 0.051 0.060 open_file
0.581 0.109 0.094 copy_sections_to_output_file
0.581 0.109 0.094 __TEXT
0.000 0.000 0.000 __mach_header
0.000 0.000 0.000 __stubs
0.009 0.002 0.001 __objc_methtype
0.581 0.109 0.094 __text
0.005 0.001 0.001 __objc_stubs
0.000 0.000 0.000 __oslogstring
0.026 0.004 0.003 __swift5_assocty
0.003 0.000 0.000 __swift5_builtin
0.027 0.004 0.004 __swift5_capture
0.000 0.000 0.000 __swift5_entry
0.090 0.013 0.013 __swift5_fieldmd
0.186 0.027 0.026 __const
0.000 0.000 0.000 __stub_helper
0.017 0.003 0.002 __gcc_except_tab
0.060 0.008 0.008 __cstring
0.002 0.000 0.000 __swift5_mpenum
0.039 0.005 0.005 __swift5_proto
0.132 0.020 0.019 __unwind_info
0.004 0.001 0.001 __swift5_protos
0.038 0.005 0.005 __swift5_reflstr
0.075 0.011 0.010 __swift5_typeref
0.229 0.048 0.039 __eh_frame
0.047 0.008 0.006 __swift5_types
0.000 0.000 0.000 __ustring
0.000 0.000 0.000 __literal4
0.001 0.000 0.000 __literal8
0.009 0.001 0.002 __objc_classname
0.039 0.011 0.009 __objc_methname
0.001 0.000 0.000 __entitlements
0.000 0.000 0.000 __literal16
0.185 0.027 0.026 __DATA_CONST
0.007 0.002 0.001 __got
0.004 0.001 0.001 __mod_init_func
0.006 0.001 0.001 __objc_catlist
0.127 0.020 0.018 __cfstring
0.025 0.005 0.003 __objc_classlist
0.178 0.026 0.025 __const
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.008 0.001 0.001 __objc_protolist
0.476 0.079 0.072 __DATA
0.000 0.000 0.000 __la_symbol_ptr
0.000 0.000 0.000 __thread_ptrs
0.116 0.018 0.016 __data
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __thread_vars
0.000 0.000 0.000 __thread_data
0.001 0.000 0.000 __objc_arraydata
0.000 0.000 0.000 __objc_arrayobj
0.041 0.005 0.006 __objc_classrefs
0.254 0.039 0.036 __objc_const
0.000 0.000 0.000 __objc_dictobj
0.000 0.000 0.000 __objc_doubleobj
0.000 0.000 0.000 __objc_floatobj
0.001 0.000 0.000 __objc_intobj
0.008 0.001 0.001 __objc_ivar
0.004 0.001 0.001 __objc_protorefs
0.064 0.017 0.014 __objc_data
0.021 0.003 0.004 __objc_selrefs
0.000 0.000 0.000 __thread_bss
0.000 0.000 0.000 __objc_stublist
0.046 0.013 0.010 __objc_superrefs
0.482 0.081 0.073 __LLVM
0.482 0.081 0.073 __bitcode
0.011 0.003 0.002 __cmdline
0.001 0.000 0.000 __swift_cmdline
0.029 0.004 0.004 __swift_modhash
0.242 0.034 0.034 __bundle
0.442 0.069 0.064 __LINKEDIT
0.001 0.001 0.000 __rebase
0.004 0.001 0.001 __binding
0.000 0.000 0.000 __data_in_code
0.437 0.067 0.063 __symbol_table
0.001 0.000 0.000 __lazy_binding
0.265 0.038 0.037 __export
0.004 0.001 0.001 __ind_sym_tab
0.000 0.000 0.000 __string_table
0.000 0.000 0.000 __code_signature
0.001 0.000 0.000 __func_starts
0.359 0.173 0.991 write_signature
0.000 0.028 0.028 msync
0.000 0.003 0.003 close_file
So for assign_offsets
the bottlenecks are __text
and __unwind_info
. No, I'm not setting -thread_count 1
anywhere. Initially I thought that maybe under Bazel, there's some kind of limitation to how many cores an action can use. But when I try to run the same action the first time to trace it, I can see it that it's definitely slower (5s in total for example, but varies) than a 3rd run that looks like the first snippet above.
I wonder if there's anything special about the output directory in your environment. Especially the perf number for write_signature
is very odd. Can you make sure that the output directory is a regular locally-mounted directory?
One thing you can try is to specify /dev/null
as an output file. Since /dev/null
cannot be mmap'ed, sold writes an output file to a memory buffer and then call write(2) to write the buffer to /dev/null
. In this situation, we should be able to observe the real performance of write_signature
. If write_signature
is slow even wtih a memory buffer, there's something wrong with it. Otherwise, it is likely that there's something wrong with the file or the filesystem.
Also one question for my personal interest: I understand that the cached scenario makes sense to test, but I was wondering how does that affect real-world performance. Every incremental build takes about 10s or more to link, while of course rerunning the same command by not changing anything in the codebase takes 2-3x times less than that.
mold/sold aims to increase developer productivity especially in rapid debug-edit-rebuild cycles, so I think assuming that most of its input files are cached is reasonable. It is odd if every incremental build takes about 10s or more to link because it implies that all input files are fresh and new on each incremental build. If you build a project, edit a single source file, compile and re-link it, the second link is faster than the first link, no?
By setting -o /dev/null
I can see this:
0.287 0.006 0.038 write_signature
Okay, so it's slow only when it's reading from an mmap'ed file. Random thoughts:
I'd also try the following patch. msync
doesn't take too long on my machine, but it might not be the case on your machine.
diff --git a/macho/output-chunks.cc b/macho/output-chunks.cc
index 46c50e1e..95aa44d1 100644
--- a/macho/output-chunks.cc
+++ b/macho/output-chunks.cc
@@ -1430,20 +1430,14 @@ void CodeSignatureSection<E>::write_signature(Context<E> &ctx) {
u8 *start = ctx.buf + i * E::page_size;
u8 *end = ctx.buf + std::min<i64>((i + 1) * E::page_size, this->hdr.offset);
sha256_hash(start, end - start, buf + i * SHA256_SIZE);
};
for (i64 i = 0; i < num_blocks; i += 1024) {
i64 j = std::min(num_blocks, i + 1024);
-
-#if __APPLE__
- // Calling msync() with MS_ASYNC speeds up the following msync()
- // with MS_INVALIDATE.
- msync(ctx.buf + i * E::page_size, 1024 * E::page_size, MS_ASYNC);
-#endif
}
// A LC_UUID load command may also contain a crypto hash of the
// entire file. We compute its value as a tree hash.
if (ctx.arg.uuid == UUID_HASH) {
u8 uuid[SHA256_SIZE];
sha256_hash(ctx.buf + this->hdr.offset, this->hdr.size, uuid);
an antivirus program makes file IO very slow as it scans newly created files.
A lot of these large companies have endpoint security software that does exactly that: https://brentley.dev/corporate-crapware/
an antivirus program makes file IO very slow as it scans newly created files.
A lot of these large companies have endpoint security software that does exactly that: https://brentley.dev/corporate-crapware/
If that's the case, does applying the following patch changes the performance characteristics? This patch makes sold to use write(2) instead of mmap(2) to write to an output file.
diff --git a/common/output-file-unix.h b/common/output-file-unix.h
index 92af144f..5de64fd1 100644
--- a/common/output-file-unix.h
+++ b/common/output-file-unix.h
@@ -133,20 +133,7 @@ OutputFile<Context>::open(Context &ctx, std::string path, i64 filesize, i64 perm
if (path.starts_with('/') && !ctx.arg.chroot.empty())
path = ctx.arg.chroot + "/" + path_clean(path);
- bool is_special = false;
- if (path == "-") {
- is_special = true;
- } else {
- struct stat st;
- if (stat(path.c_str(), &st) == 0 && (st.st_mode & S_IFMT) != S_IFREG)
- is_special = true;
- }
-
- OutputFile<Context> *file;
- if (is_special)
- file = new MallocOutputFile(ctx, path, filesize, perm);
- else
- file = new MemoryMappedOutputFile(ctx, path, filesize, perm);
+ OutputFile<Context> *file = new MemoryMappedOutputFile(ctx, path, filesize, perm);
#ifdef MADV_HUGEPAGE
// Enable transparent huge page for an output memory-mapped file.
I disabled the system extensions that could interfere with the performance and applied your patch (and also without it), and still about the same performance in both cases:
User System Real Name
6.933 2.919 7.310 all
2.835 2.000 4.189 read_input_files
0.192 0.063 0.084 resolve_symbols
0.042 0.042 0.011 create_internal_file
0.000 0.000 0.000 handle_exported_symbols_list
0.000 0.000 0.000 handle_unexported_symbols_list
0.069 0.003 0.025 claim_unresolved_symbols
0.085 0.001 0.011 remove_unreferenced_subsections
0.044 0.009 0.065 create_synthetic_chunks
0.339 0.038 0.079 merge_mergeable_sections
0.063 0.004 0.010 uniquify_literals __objc_methname
0.092 0.008 0.027 uniquify_literals __cstring
0.003 0.001 0.001 uniquify_literals __literal8
0.006 0.001 0.001 uniquify_literals __objc_classname
0.012 0.001 0.002 uniquify_literals __objc_methtype
0.000 0.000 0.000 uniquify_literals __oslogstring
0.001 0.000 0.000 uniquify_literals __literal16
0.000 0.000 0.000 uniquify_literals __literal4
0.009 0.001 0.010 uniquify_literal_pointers
0.978 0.008 0.111 scan_relocations
1.361 0.127 0.331 assign_offsets
0.312 0.071 0.172 __TEXT
0.000 0.000 0.029 __mach_header
0.000 0.000 0.000 __stubs
0.156 0.065 0.040 __text
0.000 0.000 0.000 __stub_helper
0.000 0.000 0.000 __gcc_except_tab
0.001 0.000 0.001 __cstring
0.151 0.005 0.098 __unwind_info
0.001 0.000 0.001 __const
0.000 0.000 0.000 __eh_frame
0.000 0.000 0.000 __entitlements
0.000 0.000 0.000 __literal16
0.000 0.000 0.000 __literal4
0.000 0.000 0.000 __literal8
0.000 0.000 0.000 __objc_classname
0.001 0.000 0.001 __objc_methname
0.000 0.000 0.000 __objc_methtype
0.000 0.000 0.000 __objc_stubs
0.000 0.000 0.000 __oslogstring
0.000 0.000 0.000 __swift5_assocty
0.000 0.000 0.000 __swift5_builtin
0.000 0.000 0.000 __swift5_capture
0.000 0.000 0.000 __swift5_entry
0.000 0.000 0.000 __swift5_fieldmd
0.000 0.000 0.000 __swift5_mpenum
0.000 0.000 0.000 __swift5_proto
0.000 0.000 0.000 __swift5_protos
0.000 0.000 0.000 __swift5_reflstr
0.000 0.000 0.000 __swift5_typeref
0.000 0.000 0.000 __swift5_types
0.000 0.000 0.000 __ustring
0.004 0.000 0.004 __DATA_CONST
0.000 0.000 0.000 __got
0.002 0.000 0.002 __const
0.000 0.000 0.000 __mod_init_func
0.002 0.000 0.002 __cfstring
0.000 0.000 0.000 __objc_catlist
0.000 0.000 0.000 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.000 0.000 0.000 __objc_protolist
0.003 0.000 0.003 __DATA
0.000 0.000 0.000 __la_symbol_ptr
0.000 0.000 0.000 __thread_ptrs
0.001 0.000 0.001 __data
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __thread_vars
0.000 0.000 0.000 __thread_data
0.000 0.000 0.000 __objc_arraydata
0.000 0.000 0.000 __objc_arrayobj
0.000 0.000 0.000 __objc_classrefs
0.001 0.000 0.001 __objc_const
0.000 0.000 0.000 __objc_data
0.000 0.000 0.000 __objc_dictobj
0.000 0.000 0.000 __objc_doubleobj
0.000 0.000 0.000 __objc_floatobj
0.000 0.000 0.000 __objc_intobj
0.000 0.000 0.000 __objc_ivar
0.000 0.000 0.000 __objc_protorefs
0.000 0.000 0.000 __objc_selrefs
0.000 0.000 0.000 __objc_stublist
0.000 0.000 0.000 __objc_superrefs
0.000 0.000 0.000 __LLVM
0.000 0.000 0.000 __bitcode
0.000 0.000 0.000 __bundle
0.000 0.000 0.000 __cmdline
0.000 0.000 0.000 __swift_cmdline
0.000 0.000 0.000 __swift_modhash
1.042 0.057 0.152 __LINKEDIT
0.848 0.041 0.104 __rebase
0.006 0.001 0.001 __lazy_binding
0.132 0.007 0.017 __data_in_code
0.000 0.000 0.000 __ind_sym_tab
1.042 0.056 0.151 __binding
0.560 0.034 0.070 __func_starts
1.034 0.055 0.142 __export
0.603 0.035 0.076 __symbol_table
0.000 0.000 0.000 __string_table
0.000 0.000 0.000 __code_signature
0.010 0.102 0.133 open_file
0.519 0.145 0.164 copy_sections_to_output_file
0.519 0.145 0.164 __TEXT
0.000 0.000 0.000 __mach_header
0.000 0.000 0.000 __stubs
0.518 0.145 0.164 __text
0.007 0.001 0.001 __objc_methtype
0.067 0.005 0.013 __swift5_fieldmd
0.004 0.001 0.001 __objc_stubs
0.000 0.000 0.000 __oslogstring
0.014 0.001 0.003 __swift5_assocty
0.002 0.000 0.000 __swift5_builtin
0.019 0.001 0.003 __swift5_capture
0.000 0.000 0.000 __swift5_entry
0.001 0.000 0.000 __swift5_mpenum
0.024 0.001 0.005 __swift5_proto
0.004 0.000 0.001 __swift5_protos
0.027 0.002 0.005 __swift5_reflstr
0.036 0.002 0.009 __swift5_typeref
0.085 0.013 0.023 __const
0.021 0.002 0.006 __swift5_types
0.000 0.000 0.000 __ustring
0.109 0.037 0.034 __eh_frame
0.000 0.000 0.000 __stub_helper
0.014 0.007 0.005 __gcc_except_tab
0.023 0.004 0.008 __cstring
0.054 0.018 0.017 __unwind_info
0.000 0.000 0.000 __literal4
0.001 0.000 0.000 __literal8
0.006 0.002 0.002 __objc_classname
0.018 0.009 0.007 __objc_methname
0.000 0.000 0.000 __entitlements
0.000 0.000 0.000 __literal16
0.242 0.031 0.055 __DATA
0.000 0.000 0.000 __la_symbol_ptr
0.000 0.000 0.000 __thread_ptrs
0.020 0.004 0.004 __objc_selrefs
0.242 0.031 0.055 __data
0.000 0.000 0.000 __objc_dictobj
0.000 0.000 0.000 __thread_data
0.001 0.000 0.000 __objc_arraydata
0.000 0.000 0.000 __thread_bss
0.000 0.000 0.000 __objc_doubleobj
0.000 0.000 0.000 __objc_floatobj
0.001 0.000 0.000 __objc_intobj
0.000 0.000 0.000 __objc_arrayobj
0.007 0.001 0.002 __objc_ivar
0.024 0.004 0.005 __objc_classrefs
0.000 0.000 0.000 __objc_imageinfo
0.000 0.000 0.000 __thread_vars
0.004 0.001 0.001 __objc_protorefs
0.000 0.000 0.000 __objc_stublist
0.005 0.000 0.001 __objc_superrefs
0.111 0.006 0.022 __objc_const
0.067 0.003 0.013 __objc_data
0.459 0.122 0.134 __LLVM
0.459 0.122 0.134 __bitcode
0.190 0.013 0.043 __cmdline
0.395 0.113 0.121 __bundle
0.001 0.000 0.000 __swift_cmdline
0.009 0.003 0.002 __swift_modhash
0.491 0.131 0.141 __LINKEDIT
0.003 0.000 0.001 __rebase
0.003 0.001 0.001 __binding
0.000 0.000 0.000 __lazy_binding
0.159 0.010 0.033 __export
0.001 0.000 0.000 __func_starts
0.000 0.000 0.000 __data_in_code
0.325 0.120 0.107 __symbol_table
0.002 0.001 0.000 __ind_sym_tab
0.000 0.000 0.000 __string_table
0.000 0.000 0.000 __code_signature
0.240 0.029 0.054 __DATA_CONST
0.004 0.001 0.001 __got
0.153 0.010 0.031 __const
0.001 0.000 0.000 __mod_init_func
0.061 0.011 0.017 __cfstring
0.003 0.001 0.001 __objc_catlist
0.013 0.005 0.003 __objc_classlist
0.000 0.000 0.000 __objc_nlcatlist
0.000 0.000 0.000 __objc_nlclslist
0.004 0.002 0.001 __objc_protolist
0.287 0.371 2.023 write_signature
0.000 0.029 0.029 msync
0.000 0.003 0.003 close_file
@rui314 What's the easiest way for me to count the number of files that are part of the read_input_files
step?
Adding -Wl,-stats
is the easiest way to know the number of input files.
Thanks! That gives me this.
num_rels=14977191
num_syms=7206589
num_subsections=3312576
num_merged_strings=1141406
num_merged_literal_pointers=83573
num_objs=24927
num_dylibs=134
This specific link was:
User System Real Name
6.591 2.415 5.413 all
2.687 1.673 3.353 read_input_files
0.251 0.302 1.190 write_signature
I tried out sold for linking an iOS app executable that's ~6.5k swift files. Building on an M1 mac (10 cores) averaged ~2 seconds for the default linker and ~12 seconds using sold. Is this plausible or do you suspect something wrong with my configuration. From the README it wasn't clear if sold always expected to beat Xcode's default, or if there's certain characteristics of a project that may make sold comparatively faster or slower.