Closed al13n321 closed 1 year ago
Do you know which toolchain invocation results in the abbreviations being like this?
I don't know what compiler flags or tools are responsible, but it's open source and should be reproducible (with considerable effort).
The binary is ClickHouse (https://github.com/ClickHouse/ClickHouse), can be downloaded like this:
curl 'https://builds.clickhouse.com/master/amd64/clickhouse' -o clickhouse && chmod a+x clickhouse && ./clickhouse local -q "select * from system.build_options"
It's mostly C++, built with clang 16, statically linked with a kitchen sink of libraries (~100 of them). Some flags:
CXX_FLAGS --gcc-toolchain=./cmake/linux/../../contrib/sysroot/linux-x86_64 -fdiagnostics-color=always -Xclang -fuse-ctor-homing -Wno-enum-constexpr-conversion -fsized-deallocation -gdwarf-aranges -pipe -mssse3 -msse4.1 -msse4.2 -mpclmul -mpopcnt -fasynchronous-unwind-tables -ffile-prefix-map=.=. -ftime-trace -falign-functions=32 -mbranches-within-32B-boundaries -fdiagnostics-absolute-paths -fstrict-vtable-pointers -Wall -Wextra -Wframe-larger-than=65536 -Weverything -Wpedantic -Wno-zero-length-array -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-c++20-compat -Wno-sign-conversion -Wno-implicit-int-conversion -Wno-implicit-int-float-conversion -Wno-ctad-maybe-unsupported -Wno-disabled-macro-expansion -Wno-documentation-unknown-command -Wno-double-promotion -Wno-exit-time-destructors -Wno-float-equal -Wno-global-constructors -Wno-missing-prototypes -Wno-missing-variable-declarations -Wno-padded -Wno-switch-enum -Wno-undefined-func-template -Wno-unused-template -Wno-vla -Wno-weak-template-vtables -Wno-weak-vtables -Wno-thread-safety-negative -Wno-enum-constexpr-conversion -Wno-unsafe-buffer-usage -O2 -g -DNDEBUG -O3 -g -gdwarf-4 -flto=thin -fwhole-program-vtables -fno-pie
LINK_FLAGS --gcc-toolchain=./cmake/linux/../../contrib/sysroot/linux-x86_64 --ld-path=/usr/bin/ld.lld-16 -Wl,--no-export-dynamic -Wl,--gdb-index -Wl,--build-id=sha1 -no-pie -Wl,-no-pie -flto=thin -fwhole-program-vtables
(The flags that stood out to me: -gdwarf-4 -gdwarf-aranges -Wl,--gdb-index -flto=thin -fwhole-program-vtables
)
If you build it locally with default flags, the binary has 7708 units and 5999 abbreviations. The official release binary downloaded like above has 34629 units and 5572 abbreviations. Idk what causes the difference - I guess just different compiler/linker flags?
There's also this interesting postprocessing step, where sections of the binary are compressed, and the decompression code is compiled in: https://github.com/ClickHouse/ClickHouse/blob/master/utils/self-extracting-executable/CMakeLists.txt . Hard to imagine how it can 4x the number of units, but who knows.
That's all I could gather quickly. If you don't have good guesses based on that, and if it's important to you, I can spend some more time investigating, but would prefer to move on.
Thanks for the info. I'll think about improving AbbreviationsCache
to handle this later.
This change looks fine. The CI failure is unrelated, and I'll merge after I've fixed it.
I noticed that, on one binary, iterating over
.debug_info
units was slow because it re-parsed the same abbreviations multiple times. Adding a cache sped it up from 1.5 seconds to 0.37 seconds.This particular binary had 33109 compilation units with 5513 unique
debug_abbrev_offset
values, so eachAbbreviations
was parsed 6 times on average.Gimli's
AbbreviationsCache
was ineffective because:(In this case the abbreviations at offset 0 were used by only one unit, and all the reused abbreviations were at other offsets.)
This PR doesn't change
AbbreviationsCache
, but adds a trivial provision to allow the user to do their own caching of abbreviations. (A trivialHashMap
works fine in my case, but idk whether it would be good enough for all gimli users.)