Open jiblime opened 5 years ago
Yup, it is normal. It's strip-flags
from flag-o-matic.eclass
at work. You can override it by emerging sys-config/ltoize
with USE=override-flagomatic
. There's a manual-enable set in ltoworkarounds.conf
for GCC to force its use though, as I've found it won't build using LTO at all right now. It used to, though!
GCC doesn't need the override.
The trick is to use an env file, here's my gcc.conf. EXTRA_ECONF='--with-build-config=bootstrap-lto' BOOT_CFLAGS="-march=native ${OPT} ${FALIGN} ${GRAPHITE}"
The EXTRA_ECONF configures gcc to build with lto, and BOOT_CFLAGS which isn't stripped by the gentoo ebuild tells it to bootstrap with those flags (the definitions of OPT etc are in my make.conf but you get the idea). Put this file in /etc/portage/env, and add a file in /etc/portage/package.env with "sys-devel/gcc gcc.conf" to use it.
@nivedita76 you don't have to mention it in packag.env, just put them in /etc/portage/env/sys-devel/gcc and emerge will pick it up
My real config has it only applied for >gcc-8 (so I have a "stable" compiler version to fall back on just in case). Would it pick it up if I did env/sys-devel/gcc-9, or does it have to be the full version number then?
@nivedita76 Thank you! I just tested it and it seems to be working! I'll look into adding a USE=lto
to sys-devel/gcc
. I also tried BOOT_CFLAGS
and noticed the flags are indeed passing through, at least in the first stage of compilation.
PR created upstream: https://github.com/gentoo/gentoo/pull/11943
Thanks! They are used to build the compiler itself but not the startup libraries (libgcc etc) I think.
@nivedita76 thank you for the helpful tip. Do you happen to have any resources on the differences between BOOT_CFLAGS and CFLAGS?
BOOT_CFLAGS are what are used for building stage2/stage3 compilers, i.e. what eventually gets installed. I'm actually not 100% sure whether CFLAGS gets included in that by default or it only uses it for stage1 or something.
There's some info in the Building section here but it doesn't mention what happens to regular CFLAGS. It does suggest a way of passing custom CFLAGS to the libgcc etc as well (CFLAGS_FOR_TARGET)
@nivedita76 correct me if I'm wrong but isn't using EXTRA_ECONF='--with-build-config=bootstrap-lto' preventing PGO build? which uses profiledbootstrap?
Nope. The make target is profiledbootstrap which is what makes it use pgo.
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=config/bootstrap-lto.mk;hb=HEAD
The configure argument basically adds these into the Makefile. Looking at the snippet I'm actually surprised that lto build works without using this as well.. the -frandom-seed argument says: This option provides a seed that GCC uses in place of random numbers in generating certain symbol names that have to be different in every compiled file. It is also used to place unique stamps in coverage data files and the object files that produce them. You can use the -frandom-seed option to produce reproducibly identical object files.
I would have thought not having that would mess up the build's test -- it checks that it can compile itself reproducibly to make sure you don't end up with a horribly broken compiler.
Hm I think it doesn't do the comparisons if it's doing a pgo build rather than a regular one.
@nivedita76 Thanks! I stand corrected
Upstream bug is here: https://bugs.gentoo.org/685634
Quote:
> >>>>>> Please mention in both bootstrap-lto-lean.mk and the documentation
> >>>>>> that the intended make target for this config is profiledbootstrap
> >>>>>> since for non-profiledbootstrap it ends up not using LTO at all. A
> >>>>>> "lean"
> >>>>>> mode for non-profiledbootstrap would need to set up things to
> >>>>>> use LTO only for stage3 which means not doing a bootstrap comparison
> >>>>>> which means we could "skip" stage2 as well here.
From: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg210066.html
It appears that this is the intended way to build GCC with LTO and PGO. You use the build config option at configure time and then make profiledbootstrap
at build time. I plan on optionally feeding CFLAGS
into BOOT_CFLAGS
when using override-flagomatic
or perhaps introducing another USE like optimize-gcc
, with sys-config/ltoize
.
Patch was accepted upstream!
Now that we can use both LTO and PGO in conjunction, I'd like to also support users injecting their CFLAGS
into BOOT_CFLAGS
, minus -flto
(since that is handled internally). Since we use bootstrap-lto
as the configuration for GCC, comparisons are made between stage 2 and stage 3 binaries as a test to ensure the final GCC is sane. I'll be testing out all optimizations on my own rig for a few weeks and depending how that goes, it would be nice to have an opt-in for users to do this.
@InBetweenNames If you use pgo no comparison is done.
I added this to package.cflags/gcc
>=sys-devel/gcc-9 *FLAGS-=-flto* BOOT_CFLAGS='"${CFLAGS} ${OPTCFLAGS}"'
Are you sure no comparison is done? I checked the bootstrap-lto.mk
config here:
https://github.com/gcc-mirror/gcc/blob/master/config/bootstrap-lto.mk
do-compare = $(SHELL) $(srcdir)/contrib/compare-lto $$f1 $$f2
extra-compare = gcc/lto1$(exeext)
It seems it compares two stages at least. Does PGO skip over do-compare
and extra-compare
?
Yes it will compare if you do only lto, but pgo bootstrap has no compare targets. It already builds 4 stages I guess they felt building a 5th for the comparison was just too much.
I patched it to add one and with my options it does checkout fwiw.
Excellent! Do you think we should integrate your patch here? It might ease some users minds about applying LTO + PGO + BOOT_CFLAGS
optimizations to their GCC.
Attaching the current state. This will actually do a 6-stage bootstrap. It uses the profile from the stage built using profile-use (normally the last stage) to do another build, idea was to collect better profiling information about the passes that only get enabled with profile-use. It then does a compare of that final product, so 6 stages total. I've tested with bootstrap-lto though not with the -lean variant.
@nivedita76 one more question -- I notice you use OPTCFLAGS
as well, do you have those defined somewhere?
@InBetweenNames I have that in my make.conf. The bashrc-mv overlay appends those to CFLAGS. So what I did was have CFLAGS be safe defaults and set all the extra flags in OPTCFLAGS. This is what the flags section of my make.conf looks like. (note some of the stuff is unused)
source make.conf.lto.defines
FALIGN="-falign-functions=32"
# RETPOLINE="-mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register"
RETPOLINE=""
FTLS="-mtls-dialect=gnu2"
LOOPPAR="-floop-parallelize-all -ftree-loop-parallelize=4"
NOPLT="-fno-plt"
FVISIBILITY="-fvisibility-inlines-hidden"
OPT="-O3 -fira-loop-pressure -flive-range-shrinkage"
OPTCFLAGS="${OPT} ${FASTMATH} ${GRAPHITE} ${IPA} ${FLTO} ${SEMINTERPOS} ${FTLS}"
OPTCXXFLAGS="${OPTCFLAGS} -fdevirtualize-at-ltrans ${FVISIBILITY}"
# DEBUGFLAGS="-ggdb"
DEBUGFLAGS=""
SAFEFLAGS="-pipe -march=native -O2 ${FALIGN} ${NOPLT}"
CFLAGS="${SAFEFLAGS} ${DEBUGFLAGS} ${RETPOLINE}"
CFLAGS_x86="${CFLAGS_x86} -mfpmath=sse"
CXXFLAGS="${CFLAGS}"
RUSTFLAGS="-C target-cpu=native -C opt-level=2"
LDFLAGS="${LDFLAGS}"
I was able to edit the GCC ebuild and push in my own flags, which BOOT_CFLAGS inherited (if only I knew). The difference though is that my compile time was cut in half (?!). I'm willing to bet you can add EXTRA_ECONF='STAGE1_CFLAGS="-O2 -pipe"'
to your package.env/ file instead of going through this trouble. But Gentoo is about choices!
sys-devel/gcc: 1:26:34 -- LTO/PGO
sys-devel/gcc: 34′04″ -- No LTO/PGO so I can test -flto=auto patch without waiting
sys-devel/gcc: 34′18″ -- No LTO/PGO for the same reason, different implicit multithread -flto patch
sys-devel/gcc: 46′17″ -- LTO/PGO tested with -flto auto and injected stage 1 flags, compile time reduced by >30min ^^
The -flto patch now automatically detects the number of CPUs I have so I no longer need to define a number. This was backported from GCC 10 and is right here. All that needs fixing is the Changelog.
The ebuild to use custom flags:
# Copyright 1999-2019 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2
EAPI="7"
PATCH_VER="3"
inherit toolchain
KEYWORDS="~alpha amd64 ~arm arm64 ~hppa ~ia64 ~m68k ~mips ~ppc ppc64 ~riscv s390 ~sh sparc x86"
IUSE+="custom-cflags"
RDEPEND=""
DEPEND="${RDEPEND}
elibc_glibc? ( >=sys-libs/glibc-2.13 )
>=${CATEGORY}/binutils-2.20"
if [[ ${CATEGORY} != cross-* ]] ; then
PDEPEND="${PDEPEND} elibc_glibc? ( >=sys-libs/glibc-2.13 )"
fi
# Since all the ebuild does is source its environment from the toolchain eclass (and its inherits and so on)
# all that needs to be done for custom CFLAGS is to redefine strip-flags and replace-flags
# sys-config/ltoize[override-flagomatic] does this but removes all the flag-o-matic functions,
# most of which are workarounds for older GCC versions but also the essential filters for
# funky flags and substitution for architecture definitions in GCC.
# Originally I thought I needed to copy entirely and redefine the gcc_do_filter_flags function
# but it doesn't matter since strip-flags and replace-flags aren't used anywhere else
check_em() {
for eclass in eutils fixheadtails gnuconfig libtool multilib pax-utils toolchain-funcs prefix ; do
grep 'strip\-flags\|replace\-flags' $(portageq eclass_path ${SYSROOT} gentoo ${eclass})
done
# Ideally use this function to test for nonzero output and fail if so since that would mean
# *something* has changed and requires these either of these functions. For now whatever
}
pkg_setup() {
if use custom-cflags ; then
strip-flags() {
ewarn "Flags were not stripped for sanity. You might be interested in using quickpkg on GCC if this goes horribly wrong"
}
replace-flags() {
elog "Sometimes -O2 is prefixed to the compiler flags. Any -O level that follows will replace it. -flto* flags will be replaced as long as USE lto is active"
}
# -flto flags need to be filtered or else the stage 1 will need to be LTO'd too.
# That would increase build time significantly for no performance boost. USE lto will enable LTO for the later stages
filter-flags -flto*
# optimize the stage 1 a little bit to make the total compile time shorter https://patchwork.ozlabs.org/patch/766906/
STAGE1_CFLAGS="-O2 -march=native -pipe"
fi
}
emerge --info gcc
sys-devel/gcc-9.2.0-r3::local was built with the following:
USE="custom-cflags (cxx) fortran graphite lto (multilib) nls nptl objc openmp pch pgo sanitize ssp vtv (-altivec) -d -debug -doc (-fixed-point) -go (-hardened) -jit (-libssp) -objc++ -objc-gc -pie -systemtap -test -vanilla" ABI_X86="(64)"
CFLAGS="-O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin -Wl,-O1 -Wl,--as-needed"
CXXFLAGS="-O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin -Wl,-O1 -Wl,--as-needed"
FEATURES="multilib-strict xattr usersync merge-sync parallel-fetch news strict assume-digests pid-sandbox usersandbox preserve-libs split-log unmerge-logs ipc-sandbox config-protect-if-modified candy unknown-features-warn split-elog binpkg-logs binpkg-docompress ccache protect-owned unmerge-orphans parallel-install sandbox userfetch binpkg-dostrip userpriv network-sandbox distlocks fixlafiles sfperms"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin"
I generally use -fuse-ld
so I can know what I built a package with but I may opt to just default to gold only after the recent issues. I've also had the most runtime issues with -fipa-pta
and so I no longer use those; -fno-plt
-fno-semantic-interposition
seems to consistently give the best performance out of all the flags with the least random runtime errors
-flto=auto
alone reduce the compilation time to such a degree?-fipa-pta
and the Gold linker? I have had none that I can think of.-flto=auto
is the same as -flto=jobserver
, so it would be the same for GCC. I think I got really lucky with ccache with that.
About -fipa-pta
, I was mistaken because I had it on my mind, sorry about that
-flto=jobserver
alone reduce the compilation time to such a degree? It is unclear to me.-fno-plt
issues. What were they?I appear to be unclear in explanation. When flto
is called with =jobserver
, that means linking will be parallelized equal to the MAKEOPTS that you've specified. If you have MAKEOPTS="-j4"
in your make.conf, -flto=jobserver
should mean -flto=4
. But this should only true be for plain make/gmake. Other make systems like ninja apparently do not recognize the jobserver argument. You would be better off specifying the number of threads that -flto
will use based on the number of threads your processor has.
-fno-plt
problems are random, and that is the problem I have with it. I can't track when it is causing an issue and don't really care for it. This is equal to removing -flto
from all flags just because I am too lazy to figure out workarounds that -flto
causes. So I am just too lazy to figure it out.
Note:
The only benefit that the GCC 10 -flto
auto-parallelization backport I've used is convenience. The only case I know where I would benefit from it is if I had decided to configure python --with-lto
. But that would be a bad idea because Python's configure.ac specificies LTOFLAGS="-flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none"
, and none is not optimal afaik because you want partitioning.
Something I've noticed with compiling 9.1.0 vs <= 8.3.0 is that it is compiled with these flags:
This is in the initial log when emerging:
And these are my USE flags:
This is not true for other packages, and was not fixed with a recompile and exporting defined flags from make.conf. Is this expected behavior?