Closed c42f closed 10 years ago
A major culprit here seems to be gcc's interprocedural constant propagation which is creating copies of functions:
./bloat_test.sh -DUSE_TINYFORMAT -fno-ipa-cp-clone -O3
real 0m36.611s user 0m34.102s sys 0m1.992s 436K _bloat_testtmp.out 400K _bloat_testtmp.out
What seems to be happening is that each translation unit gets a constant-propagated version of various functions like FormatIterator::accept(), which prevents them from being emitted as weak symbols. In turn this prevents template functions from multiple translation units from being merged by the linker, leading to a rather stupid amount of additional bloat.
This is well and truly fixed as of 38173bb with gcc-4.8.2:
$ ./bloat_test.sh -DUSE_TINYFORMAT -O3
real 0m32.976s
user 0m30.934s
sys 0m2.024s
168K _bloat_test_tmp_.out
136K _bloat_test_tmp_stripped.out
gcc-4.6 still makes some questionable optimization choices, but things are also much improved there.
gcc-4.6.3 now produces quite a bloated build with optimizations on:
./bloat_test.sh -O3 -DUSE_TINYFORMAT
real 0m36.515s user 0m34.010s sys 0m1.972s 1020K _bloat_testtmp.out 972K _bloat_testtmp.out
(checked also with version used for the original bloat tests, and it's even worse). This used to be only 300K or so with version 4.4 - presumably gcc is now inlining a lot more aggressively :-(
Still not as bad as boost.format, but by no means great either.