InBetweenNames / gentooLTO

A Gentoo Portage configuration for building with -O3, Graphite, and LTO optimizations
GNU General Public License v2.0
571 stars 97 forks source link

Build ffmpeg with -ffat-lto-objects #354

Open OpenSourceAnarchist opened 5 years ago

OpenSourceAnarchist commented 5 years ago

According to a post on Clear Linux's community forum (just 4 days ago), FFmpeg can be built safely with LTO so long as -ffat-lto-objects is enabled.

Quote: "FFMPEG does build nicely with the link-time optimizer, but putting -flto in the flags or configuring with --enable-lto tends to cause the build to fail with lots of undefined symbols. Instead, put -ffat-lto-objects in the flags (already there if you use the default CFLAGS that comes with Clear) so that the linker has a fallback. Do be sure to include --extra-ldflags='-flto -fuse-linker-plugin' --ar=gcc-ar."

Source: https://community.clearlinux.org/t/tips-and-techniques-for-building-ffmpeg/795

ionenwks commented 5 years ago

I've "personally" never had trouble using LTO on ffmpeg as long as I use --enable-lto and not set -flto myself in the CFLAGS (on gentoo I use EXTRA_FFMPEG_CONF="--enable-lto"). Maybe it has to do with my USE flags though. Edit: I believe --enable-lto omits LTO on some problematic parts, so that's why it's important to not set it yourself or else it just use it on everything anyway. And while at it, my USE flags are media-video/ffmpeg fdk fontconfig libaom libass mp3 openssl opus pic theora truetype vorbis vpx x264 x265 xcb

barolo commented 5 years ago

I can confirm that it works currently [ it didn't just few weeks ago, same setup ] via EXTRA_FFMPEG_CONF="--enable-lto", with ffmpeg-4.1.3 USE="X alsa bzip2 encode gpl hardcoded-tables iconv libdrm network opengl openssl postproc pulseaudio threads vaapi vorbis vpx webp zlib" CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext sse sse2 sse3 sse4_1 sse4_2 ssse3"

pchome commented 5 years ago

https://bugs.gentoo.org/566282

I use [[ ${ABI} == x86 ]] && filter-flags "-flto*" || append-flags "-flto" hack in addition (in modified ebuild). "Fixes" x86 part. amd64 build was always fine for me w/ -flto flag.

ionenwks commented 5 years ago

^ Oh was it only the x86 version that's acting up? Been a while since I've seen anything wrong so don't really remember. Reminder can append to CFLAGS_amd64 and/or CFLAGS_x86 if don't want a ebuild hack. That aside, my x86 version is built with LTO as well (with --enable-lto and no -flto in CFLAGS) just fine.

pchome commented 5 years ago

I don't remember what exactly wrong with CFLAGS_amd64 and CFLAGS_x86, but I tried to use them with ffmpeg with no success.

Just tried media-video/ffmpeg *FLAGS-="-flto*" "export EXTRA_FFMPEG_CONF=--enable-lto" on gentoo ebuild, this failed too.

Error (x86) ``` src/libswscale/x86/yuv2rgb_template.c: In function ‘yuv420_bgr24_mmxext’: src/libswscale/x86/yuv2rgb_template.c:346:9: error: ‘asm’ operand has impossible constraints 346 | YUV2RGB_INITIAL_LOAD | ^ lto-wrapper: fatal error: /usr/bin/x86_64-pc-linux-gnu-gcc returned 1 exit status compilation terminated. /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status make: *** [/var/tmp/portage/media-video/ffmpeg-4.1.3/work/ffmpeg-4.1.3/ffbuild/library.mak:103: libswscale/libswscale.so.5] Error 1 make: *** Waiting for unfinished jobs.... src/libavcodec/x86/vc1dsp_mmx.c: In function ‘avg_vc1_mspel_mc30_mmxext’: src/libavcodec/x86/vc1dsp_mmx.c:318:1: error: ‘asm’ operand has impossible constraints 318 | MSPEL_FILTER13_8B (shift3, "0(%1 )", "0(%1,%3 )", "0(%1,%3,2)", "0(%1,%4 )", OP_AVG, avg_) | ^ lto-wrapper: fatal error: /usr/bin/x86_64-pc-linux-gnu-gcc returned 1 exit status compilation terminated. /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status make: *** [/var/tmp/portage/media-video/ffmpeg-4.1.3/work/ffmpeg-4.1.3/ffbuild/library.mak:103: libavcodec/libavcodec.so.58] Error 1 * ERROR: media-video/ffmpeg-4.1.3::gentoo failed (compile phase): * emake failed * * If you need support, post the output of `emerge --info '=media-video/ffmpeg-4.1.3::gentoo'`, * the complete build log and the output of `emerge -pqv '=media-video/ffmpeg-4.1.3::gentoo'`. * The complete build log is located at '/var/tmp/portage/media-video/ffmpeg-4.1.3/temp/build.log'. * The ebuild environment file is located at '/var/tmp/portage/media-video/ffmpeg-4.1.3/temp/environment'. * Working directory: '/var/tmp/portage/media-video/ffmpeg-4.1.3/work/ffmpeg-4.1.3-abi_x86_32.x86' * S: '/var/tmp/portage/media-video/ffmpeg-4.1.3/work/ffmpeg-4.1.3' ```
ionenwks commented 5 years ago

Figured it could be my USE flags so I tried around a bit and seems I get the same error only if I remove my pic USE flag (not using that flag always seemed kind of strange considering it mixes non-pic ASM with gcc's default PIE -- the entire system is using position-independent code).

ionenwks commented 5 years ago

And yeah, pic implies --disable-asm (for x86 only) so any x86 asm-related errors won't happen. As to whether this is really slower or not, I couldn't say. Compiler does perform plenty of optimizations that may render the asm code not-so-relevant anymore (I imagine it's quite dated).

javashin commented 5 years ago

if -flto is passed to the cflags --enable-lto is automatically passed to the ./configure

javashin commented 5 years ago

i successfully built with :

media-video/ffmpeg-4.1.3::gentoo was built with the following: USE="X alsa bs2b bzip2 chromium encode fdk fontconfig gpl hardcoded-tables iconv jpeg2k ladspa libaom libass libcaca libdrm lzma modplug mp3 network openal opengl openh264 openssl opus postproc pulseaudio rubberband sdl speex svg theora threads truetype vaapi vorbis vpx wavpack webp x264 x265 xcb xvid zlib (-altivec) -amr -amrenc (-appkit) -bluray -cdio -chromaprint -codec2 -cpudetection -debug -doc -flite -frei0r -fribidi -gcrypt -gme -gmp -gnutls -gsm -iec61883 -ieee1394 -jack -kvazaar -libilbc -libressl -librtmp -libsoxr -libv4l -libxml2 -lv2 (-mipsdspr1) (-mipsdspr2) (-mipsfpu) (-mmal) -opencl -oss -pic -samba -snappy -srt -ssh -static-libs -test -twolame -v4l -vdpau -zeromq -zimg -zvbi" ABI_X86="32 (64) (-x32)" CPU_FLAGS_X86="aes avx avx2 fma3 mmx mmxext sse sse2 sse3 sse4_1 sse4_2 ssse3 -3dnow -3dnowext -fma4 -xop" FFTOOLS="aviocat cws2fws ffescape ffeval ffhash fourcc2pixfmt graph2dot ismindex pktdumper qt-faststart sidxindex trasher" VIDEO_CARDS="-nvidia" CFLAGS="-O3 -march=native -mfpmath=both -funroll-loops -falign-functions=32 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -fuse-linker-plugin -flto=3 -ffat-lto-objects -fipa-pta -fno-math-errno -fno-trapping-math -fdevirtualize-at-ltrans -fno-stack-protector -pipe -Wl,-O2 -Wl,--as-needed,-z,now -fuse-ld=gold -Wl,--hash-style=gnu" CXXFLAGS="-O3 -march=native -mfpmath=both -funroll-loops -falign-functions=32 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -fuse-linker-plugin -flto=3 -ffat-lto-objects -fipa-pta -fno-math-errno -fno-trapping-math -fdevirtualize-at-ltrans -fno-stack-protector -pipe -Wl,-O2 -Wl,--as-needed,-z,now -fuse-ld=gold -Wl,--hash-style=gnu" LDFLAGS="-Wl,-O2 -Wl,--as-needed,-z,now -fuse-ld=gold -Wl,--hash-style=gnu -O3 -march=native -mfpmath=both -funroll-loops -falign-functions=32 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -fuse-linker-plugin -flto=3 -ffat-lto-objects -fipa-pta -fno-math-errno -fno-trapping-math -fdevirtualize-at-ltrans -fno-stack-protector -pipe"

ionenwks commented 5 years ago

After retrying a bit, seems the whole thing about not having -flto in CFLAGS isn't necessary after all (I already knew the ebuild added --enable-lto but I thought there was a problem with doing it like that from previous builds, maybe there WAS at one point but been a while).

I'd personally argue the best way to build this with LTO isn't to use -ffat-lto-object but just add the pic USE flag and nothing else needs changes and can use normal -flto (everything works out regardless of x86 or amd64). Without pic it attempts to use non-pic asm (only on x86 -- flag should have close to no effect on the amd64 version) despite being in a default PIE environment which no matter how I look at it shouldn't be a thing. I feel like this flag should in fact be a gentoo default at this point.

ionenwks commented 5 years ago

^ Although, if don't want to force USE flags, adding fat-lto would be simpler for GentooLTO workarounds Edit: I guess could omit the workaround if the pic USE flag happens to be set though. If set there should be no need to change anything at all, it just works with default lto flags (seems to do for me anyway)

InBetweenNames commented 5 years ago

pic has a serious performance penalty on x86, doesn't it?

Reference #15 #47

I just tested out media-video/ffmpeg with the newer ebuild and it seems to be working fine on my system amd64 now, whereas I got ODR violations previously as well as some asm compilation errors.

@ionenwks , previously we used -fno-lto to disable LTO selectively, but ffmpeg's ebuild in particular didn't play nice with that. The upstream Gentoo maintainers were also not interested in fixing the ebuild or accepting patches to fix the ebuild to amend this. Since I realized I was on my own to fix that, I went with the approach currently chosen.

@pchome It seems that we should make this workaround apply only for x86, and for amd64 leave it as default -- does this sound reasonable?

InBetweenNames commented 5 years ago

Also, @barolo , does this work for you even with -flto in your CFLAGS? Or does it only work for you using EXTRA_FFMPEG_CONF="--enable-lto" and without -flto in your CFLAGS?

InBetweenNames commented 5 years ago

Proposed modification to ltoworkarounds.conf:

media-video/ffmpeg !*FLAGS-=-flto*

The ! will cause package.cflags to apply this workaround only to x86 and not amd64.

pchome commented 5 years ago

@InBetweenNames

It seems that we should make this workaround apply only for x86, and for amd64 leave it as default -- does this sound reasonable?

I simplified workaround to just [[ ${ABI} == x86 ]] && myconf+=( --disable-asm ), ok w/ USE=-pic and -flto. So possible solutions:

... oh, I see your comment while writing this ...

pchome commented 5 years ago

The ! will cause package.cflags to apply this workaround only to x86 and not amd64.

I'm not sure, maybe $HOSTTYPE=x86_64 for both abi_x86_32 and abi_x86_64, since you compiling on x86_64. Need to check.

InBetweenNames commented 5 years ago

Ah yes, you're right -- I'll check.

Currently I have it built with ABI_X86="32 64" with -flto enabled on both and it seems to build. I have USE=-pic set as well.

InBetweenNames commented 5 years ago

Indeed, the !*FLAGS-=-flto* won't work for exactly that reason. In that case, perhaps we should fork the ebuild? I could try to get it upstreamed, but last time I didn't have much luck.

InBetweenNames commented 5 years ago

Reference #103 too

nivedita76 commented 5 years ago

@InBetweenNames do you have clarity on why fat lto objects fixes things? From the docs it seems like the only thing it should do is include regular object code in addition to the IR, so it would only matter if the link step is not using LTO.

InBetweenNames commented 5 years ago

Indeed -- it's actually due to incorrect linker setup usually. It's rare that it happens, but when the linker is invoked in such a way that LTO is inhibited, it can't "see" the LTO symbols and will instead claim there are undefined symbols instead. -ffat-lto-objects is a hack to work around that, allowing the program to still link, but obviously not all of the code is LTOed.

nivedita76 commented 5 years ago

Wouldn’t it be completely non-lto in that case though? ie you might as well just turn off lto?

InBetweenNames commented 5 years ago

In the worst case -- absolutely. But there are cases where some object files can be linked with LTO and others can't (for example, shared object dependencies built as part of the same package). That's where -ffat-lto-objects can (theoretically) help. Obviously, a proper fix would be much more preferable.