InBetweenNames / gentooLTO

A Gentoo Portage configuration for building with -O3, Graphite, and LTO optimizations
GNU General Public License v2.0
572 stars 96 forks source link

Is it normal for GCC 9.1 to be build with stripped flags? #297

Open jiblime opened 5 years ago

jiblime commented 5 years ago

Something I've noticed with compiling 9.1.0 vs <= 8.3.0 is that it is compiled with these flags:

-march=native -pipe -O2

This is in the initial log when emerging:

 * strip-flags: CFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: CXXFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: FFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: FCFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: LDFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32 -Wl,--hash-style=gnu' to '-march=native -pipe -Wl,--hash-style=gnu -O2'

And these are my USE flags:

USE="cxx fortran graphite (multilib) nls nptl objc openmp pch pgo (pie) sanitize ssp vtv (-altivec) -d -debug -doc (-fixed-point) -go (-hardened) (-jit) (-libssp) -objc++ -objc-gc -systemtap -test -vanilla"

This is not true for other packages, and was not fixed with a recompile and exporting defined flags from make.conf. Is this expected behavior?

InBetweenNames commented 5 years ago

Yup, it is normal. It's strip-flags from flag-o-matic.eclass at work. You can override it by emerging sys-config/ltoize with USE=override-flagomatic. There's a manual-enable set in ltoworkarounds.conf for GCC to force its use though, as I've found it won't build using LTO at all right now. It used to, though!

nivedita76 commented 5 years ago

GCC doesn't need the override.

The trick is to use an env file, here's my gcc.conf. EXTRA_ECONF='--with-build-config=bootstrap-lto' BOOT_CFLAGS="-march=native ${OPT} ${FALIGN} ${GRAPHITE}"

The EXTRA_ECONF configures gcc to build with lto, and BOOT_CFLAGS which isn't stripped by the gentoo ebuild tells it to bootstrap with those flags (the definitions of OPT etc are in my make.conf but you get the idea). Put this file in /etc/portage/env, and add a file in /etc/portage/package.env with "sys-devel/gcc gcc.conf" to use it.

barolo commented 5 years ago

@nivedita76 you don't have to mention it in packag.env, just put them in /etc/portage/env/sys-devel/gcc and emerge will pick it up

nivedita76 commented 5 years ago

My real config has it only applied for >gcc-8 (so I have a "stable" compiler version to fall back on just in case). Would it pick it up if I did env/sys-devel/gcc-9, or does it have to be the full version number then?

InBetweenNames commented 5 years ago

@nivedita76 Thank you! I just tested it and it seems to be working! I'll look into adding a USE=lto to sys-devel/gcc. I also tried BOOT_CFLAGS and noticed the flags are indeed passing through, at least in the first stage of compilation.

InBetweenNames commented 5 years ago

PR created upstream: https://github.com/gentoo/gentoo/pull/11943

nivedita76 commented 5 years ago

Thanks! They are used to build the compiler itself but not the startup libraries (libgcc etc) I think.

jiblime commented 5 years ago

@nivedita76 thank you for the helpful tip. Do you happen to have any resources on the differences between BOOT_CFLAGS and CFLAGS?

nivedita76 commented 5 years ago

BOOT_CFLAGS are what are used for building stage2/stage3 compilers, i.e. what eventually gets installed. I'm actually not 100% sure whether CFLAGS gets included in that by default or it only uses it for stage1 or something.

https://gcc.gnu.org/install/

There's some info in the Building section here but it doesn't mention what happens to regular CFLAGS. It does suggest a way of passing custom CFLAGS to the libgcc etc as well (CFLAGS_FOR_TARGET)

barolo commented 5 years ago

@nivedita76 correct me if I'm wrong but isn't using EXTRA_ECONF='--with-build-config=bootstrap-lto' preventing PGO build? which uses profiledbootstrap?

nivedita76 commented 5 years ago

Nope. The make target is profiledbootstrap which is what makes it use pgo.

nivedita76 commented 5 years ago

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=config/bootstrap-lto.mk;hb=HEAD

The configure argument basically adds these into the Makefile. Looking at the snippet I'm actually surprised that lto build works without using this as well.. the -frandom-seed argument says: This option provides a seed that GCC uses in place of random numbers in generating certain symbol names that have to be different in every compiled file. It is also used to place unique stamps in coverage data files and the object files that produce them. You can use the -frandom-seed option to produce reproducibly identical object files.

I would have thought not having that would mess up the build's test -- it checks that it can compile itself reproducibly to make sure you don't end up with a horribly broken compiler.

nivedita76 commented 5 years ago

Hm I think it doesn't do the comparisons if it's doing a pgo build rather than a regular one.

barolo commented 5 years ago

@nivedita76 Thanks! I stand corrected

InBetweenNames commented 5 years ago

Upstream bug is here: https://bugs.gentoo.org/685634

Quote:

> >>>>>> Please mention in both bootstrap-lto-lean.mk and the documentation
> >>>>>> that the intended make target for this config is profiledbootstrap
> >>>>>> since for non-profiledbootstrap it ends up not using LTO at all.  A 
> >>>>>> "lean"
> >>>>>> mode for non-profiledbootstrap would need to set up things to
> >>>>>> use LTO only for stage3 which means not doing a bootstrap comparison
> >>>>>> which means we could "skip" stage2 as well here.

From: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg210066.html

It appears that this is the intended way to build GCC with LTO and PGO. You use the build config option at configure time and then make profiledbootstrap at build time. I plan on optionally feeding CFLAGS into BOOT_CFLAGS when using override-flagomatic or perhaps introducing another USE like optimize-gcc, with sys-config/ltoize.

InBetweenNames commented 5 years ago

Patch was accepted upstream!

InBetweenNames commented 5 years ago

Now that we can use both LTO and PGO in conjunction, I'd like to also support users injecting their CFLAGS into BOOT_CFLAGS, minus -flto (since that is handled internally). Since we use bootstrap-lto as the configuration for GCC, comparisons are made between stage 2 and stage 3 binaries as a test to ensure the final GCC is sane. I'll be testing out all optimizations on my own rig for a few weeks and depending how that goes, it would be nice to have an opt-in for users to do this.

nivedita76 commented 5 years ago

@InBetweenNames If you use pgo no comparison is done.

nivedita76 commented 5 years ago

I added this to package.cflags/gcc >=sys-devel/gcc-9 *FLAGS-=-flto* BOOT_CFLAGS='"${CFLAGS} ${OPTCFLAGS}"'

InBetweenNames commented 5 years ago

Are you sure no comparison is done? I checked the bootstrap-lto.mk config here:

https://github.com/gcc-mirror/gcc/blob/master/config/bootstrap-lto.mk

do-compare = $(SHELL) $(srcdir)/contrib/compare-lto $$f1 $$f2
extra-compare = gcc/lto1$(exeext)

It seems it compares two stages at least. Does PGO skip over do-compare and extra-compare?

nivedita76 commented 5 years ago

Yes it will compare if you do only lto, but pgo bootstrap has no compare targets. It already builds 4 stages I guess they felt building a 5th for the comparison was just too much.

nivedita76 commented 5 years ago

I patched it to add one and with my options it does checkout fwiw.

InBetweenNames commented 5 years ago

Excellent! Do you think we should integrate your patch here? It might ease some users minds about applying LTO + PGO + BOOT_CFLAGS optimizations to their GCC.

nivedita76 commented 5 years ago

gcc-full-pgo.txt

Attaching the current state. This will actually do a 6-stage bootstrap. It uses the profile from the stage built using profile-use (normally the last stage) to do another build, idea was to collect better profiling information about the passes that only get enabled with profile-use. It then does a compare of that final product, so 6 stages total. I've tested with bootstrap-lto though not with the -lean variant.

InBetweenNames commented 5 years ago

@nivedita76 one more question -- I notice you use OPTCFLAGS as well, do you have those defined somewhere?

nivedita76 commented 5 years ago

@InBetweenNames I have that in my make.conf. The bashrc-mv overlay appends those to CFLAGS. So what I did was have CFLAGS be safe defaults and set all the extra flags in OPTCFLAGS. This is what the flags section of my make.conf looks like. (note some of the stuff is unused)


source make.conf.lto.defines
FALIGN="-falign-functions=32"
# RETPOLINE="-mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register"
RETPOLINE=""
FTLS="-mtls-dialect=gnu2"
LOOPPAR="-floop-parallelize-all -ftree-loop-parallelize=4"
NOPLT="-fno-plt"
FVISIBILITY="-fvisibility-inlines-hidden"
OPT="-O3 -fira-loop-pressure -flive-range-shrinkage"
OPTCFLAGS="${OPT} ${FASTMATH} ${GRAPHITE} ${IPA} ${FLTO} ${SEMINTERPOS} ${FTLS}"
OPTCXXFLAGS="${OPTCFLAGS} -fdevirtualize-at-ltrans ${FVISIBILITY}"
# DEBUGFLAGS="-ggdb"
DEBUGFLAGS=""
SAFEFLAGS="-pipe -march=native -O2 ${FALIGN} ${NOPLT}"
CFLAGS="${SAFEFLAGS} ${DEBUGFLAGS} ${RETPOLINE}"
CFLAGS_x86="${CFLAGS_x86} -mfpmath=sse"
CXXFLAGS="${CFLAGS}"
RUSTFLAGS="-C target-cpu=native -C opt-level=2"
LDFLAGS="${LDFLAGS}"
jiblime commented 4 years ago

I was able to edit the GCC ebuild and push in my own flags, which BOOT_CFLAGS inherited (if only I knew). The difference though is that my compile time was cut in half (?!). I'm willing to bet you can add EXTRA_ECONF='STAGE1_CFLAGS="-O2 -pipe"' to your package.env/ file instead of going through this trouble. But Gentoo is about choices!

sys-devel/gcc: 1:26:34   -- LTO/PGO
sys-devel/gcc: 34′04″    -- No LTO/PGO so I can test -flto=auto patch without waiting
sys-devel/gcc: 34′18″    -- No LTO/PGO for the same reason, different implicit multithread -flto patch
sys-devel/gcc: 46′17″    -- LTO/PGO tested with -flto auto and injected stage 1 flags, compile time reduced by >30min ^^

The -flto patch now automatically detects the number of CPUs I have so I no longer need to define a number. This was backported from GCC 10 and is right here. All that needs fixing is the Changelog.


The ebuild to use custom flags:

# Copyright 1999-2019 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2

EAPI="7"

PATCH_VER="3"

inherit toolchain

KEYWORDS="~alpha amd64 ~arm arm64 ~hppa ~ia64 ~m68k ~mips ~ppc ppc64 ~riscv s390 ~sh sparc x86"
IUSE+="custom-cflags"

RDEPEND=""
DEPEND="${RDEPEND}
    elibc_glibc? ( >=sys-libs/glibc-2.13 )
    >=${CATEGORY}/binutils-2.20"

if [[ ${CATEGORY} != cross-* ]] ; then
    PDEPEND="${PDEPEND} elibc_glibc? ( >=sys-libs/glibc-2.13 )"
fi

# Since all the ebuild does is source its environment from the toolchain eclass (and its inherits and so on)
# all that needs to be done for custom CFLAGS is to redefine strip-flags and replace-flags

# sys-config/ltoize[override-flagomatic] does this but removes all the flag-o-matic functions,
# most of which are workarounds for older GCC versions but also the essential filters for
# funky flags and substitution for architecture definitions in GCC.

# Originally I thought I needed to copy entirely and redefine the gcc_do_filter_flags function
# but it doesn't matter since strip-flags and replace-flags aren't used anywhere else

check_em() {
    for eclass in eutils fixheadtails gnuconfig libtool multilib pax-utils toolchain-funcs prefix ; do
        grep 'strip\-flags\|replace\-flags' $(portageq eclass_path ${SYSROOT} gentoo ${eclass})
    done

    # Ideally use this function to test for nonzero output and fail if so since that would mean
    # *something* has changed and requires these either of these functions. For now whatever
}

pkg_setup() {
    if use custom-cflags ; then
        strip-flags() {
            ewarn "Flags were not stripped for sanity. You might be interested in using quickpkg on GCC if this goes horribly wrong"
        }

        replace-flags() {
            elog "Sometimes -O2 is prefixed to the compiler flags. Any -O level that follows will replace it. -flto* flags will be replaced as long as USE lto is active"
        }
        # -flto flags need to be filtered or else the stage 1 will need to be LTO'd too.
        # That would increase build time significantly for no performance boost. USE lto will enable LTO for the later stages
        filter-flags -flto* 

        # optimize the stage 1 a little bit to make the total compile time shorter https://patchwork.ozlabs.org/patch/766906/
        STAGE1_CFLAGS="-O2 -march=native -pipe" 
    fi
}

emerge --info gcc

sys-devel/gcc-9.2.0-r3::local was built with the following:
USE="custom-cflags (cxx) fortran graphite lto (multilib) nls nptl objc openmp pch pgo sanitize ssp vtv (-altivec) -d -debug -doc (-fixed-point) -go (-hardened) -jit (-libssp) -objc++ -objc-gc -pie -systemtap -test -vanilla" ABI_X86="(64)"
CFLAGS="-O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin -Wl,-O1 -Wl,--as-needed"
CXXFLAGS="-O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin -Wl,-O1 -Wl,--as-needed"
FEATURES="multilib-strict xattr usersync merge-sync parallel-fetch news strict assume-digests pid-sandbox usersandbox preserve-libs split-log unmerge-logs ipc-sandbox config-protect-if-modified candy unknown-features-warn split-elog binpkg-logs binpkg-docompress ccache protect-owned unmerge-orphans parallel-install sandbox userfetch binpkg-dostrip userpriv network-sandbox distlocks fixlafiles sfperms"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin"

I generally use -fuse-ld so I can know what I built a package with but I may opt to just default to gold only after the recent issues. I've also had the most runtime issues with -fipa-pta and -fno-plt so I no longer use those; -fno-semantic-interposition seems to consistently give the best performance out of all the flags with the least random runtime errors

elsandosgrande commented 4 years ago
  1. Does -flto=auto alone reduce the compilation time to such a degree?
  2. What issues have you been having with -fipa-pta and the Gold linker? I have had none that I can think of.
jiblime commented 4 years ago

-flto=auto is the same as -flto=jobserver, so it would be the same for GCC. I think I got really lucky with ccache with that.

About -fipa-pta, I was mistaken because I had it on my mind, sorry about that

elsandosgrande commented 4 years ago
  1. Does -flto=jobserver alone reduce the compilation time to such a degree? It is unclear to me.
  2. All right. I have also seen that you had -fno-plt issues. What were they?
jiblime commented 4 years ago
  1. I appear to be unclear in explanation. When flto is called with =jobserver, that means linking will be parallelized equal to the MAKEOPTS that you've specified. If you have MAKEOPTS="-j4" in your make.conf, -flto=jobserver should mean -flto=4. But this should only true be for plain make/gmake. Other make systems like ninja apparently do not recognize the jobserver argument. You would be better off specifying the number of threads that -flto will use based on the number of threads your processor has.

  2. -fno-plt problems are random, and that is the problem I have with it. I can't track when it is causing an issue and don't really care for it. This is equal to removing -flto from all flags just because I am too lazy to figure out workarounds that -flto causes. So I am just too lazy to figure it out.

Note:

The only benefit that the GCC 10 -flto auto-parallelization backport I've used is convenience. The only case I know where I would benefit from it is if I had decided to configure python --with-lto. But that would be a bad idea because Python's configure.ac specificies LTOFLAGS="-flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none", and none is not optimal afaik because you want partitioning.