enjoy-digital / litex

Build your hardware, easily!
Other
2.89k stars 555 forks source link

Enable LTO/remove unused symbols when compiling BIOS/firmware #682

Open DurandA opened 3 years ago

DurandA commented 3 years ago

The current GCC build (I did not try Clang) for BIOS/firmware does not strip away unused symbols, resulting in large binaries (especially when adding dependencies).

I tried adding various flags in common.mak according to answers on Stack Overflow but without much success. It either resulted in no effect or the build failed. With -flto I was not able to fix a plugin needed to handle lto object error.

mithro commented 3 years ago

@DurandA -- There was some discussion about enabling link time option.

mithro commented 3 years ago

See https://github.com/enjoy-digital/litex/pull/401

mithro commented 3 years ago

https://github.com/enjoy-digital/litex/pull/253

DurandA commented 3 years ago

@mithro Thanks for pointing it and sorry for the redundancy.

LTO support was originally merged but https://github.com/enjoy-digital/litex/commit/979f98ea3120dbba208c2b0b044ce37294653814 reverted it. At least we have an example on how to enable it.

mithro commented 3 years ago

@DurandA - We most certainly want to get LTO enabled again. Maybe you could work with @mateusz-holenko to figure out what is needed to done to do that?

DurandA commented 3 years ago

I am now into the realms of a GCC bug (https://github.com/riscv/riscv-gcc/issues/201) so that might not happen anytime soon. Note that precompiled GCC from SiFive (both version gcc-8.3.0-2019.08.0-x86_64 and gcc-8.3.0-2020.04.1) triggered another internal compiler error from the lto-wrapper.

mithro commented 3 years ago

I'm going to reopen this issue so that @mateusz-holenko can give us an updated on the status around LTO and how to move forward here.

mateusz-holenko commented 3 years ago

I guess that linked tasks/PRs provide all the status - we didn't work on the LTO issue lately. As described in #253, using it with optimisations caused problems back in the past. We were able to find a combination of switches that worked, but it turned out not to be a generic solution ( as it broke some other platforms). Maybe a newer toolchain might help here?

Another problem was that LiteX lacked good testing suite (is it better now?). Changes like globally enabling LTO influence a lot of targets and it was quite hard to make sure that they don't break anything :( Perhaps we should think about extending LiteX CI and add more test cases first?

DurandA commented 3 years ago

@mateusz-holenko Did you ever encounter the internal error of the lto-wrapper described in https://github.com/riscv/riscv-gcc/issues/201?

mateusz-holenko commented 3 years ago

@DurandA No, I don't recall that error.

Our problem was that the linker couldn't find one of the functions that was a part of the code - it looked like it was wronly optimized-out by the LTO mechanism (event though it was actually used).

DurandA commented 3 years ago

@mateusz-holenko Did you ever encounter the internal error of the lto-wrapper described in riscv/riscv-gcc#201?

I forgot to add $(CPUFLAGS) to the linker and that caused the issue.

For reference, here is the current issue when compiling the bios with LTO:

riscv64-unknown-elf-gcc -std=gnu99 -nostdlib -nodefaultlibs -Os -mno-save-restore -march=rv32im     -mabi=ilp32 -D__picorv32__  -L/home/duranda/devel/build/software/include -T /home/duranda/devel/litex/litex/soc/software/bios/linker.ld -N -o bios.elf \
    ../libbase/crt0.o \
    isr.o boot-helper.o boot.o helpers.o cmd_bios.o cmd_mem.o cmd_boot.o cmd_i2c.o cmd_spiflash.o cmd_litedram.o cmd_liteeth.o cmd_litesdcard.o main.o complete.o readline.o \
    -L../libcompiler_rt \
    -L../libbase \
    -L../liblitedram \
    -L../libliteeth \
    -L../liblitespi \
    -L../liblitesdcard \
    -llitedram -lliteeth -llitespi -llitesdcard -lbase-nofloat -lcompiler_rt
/home/duranda/riscv/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /tmp/bios.elf.OQpKqZ.ltrans0.ltrans.o: in function `.L0 ':
/home/duranda/devel/litex/litex/soc/software/bios/cmds/cmd_mem.c:197: undefined reference to `__udivdi3'
/home/duranda/riscv/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /tmp/bios.elf.OQpKqZ.ltrans0.ltrans.o:/home/duranda/devel/litex/litex/soc/software/bios/cmds/cmd_mem.c:209: undefined reference to `__udivdi3'
collect2: error: ld returned 1 exit status
make: *** [/home/duranda/devel/litex/litex/soc/software/bios/Makefile:64: bios.elf] Error 1

It seems that the linker is dropping some libgcc functions when using LTO.

mithro commented 3 years ago

@DurandA From what I understand, the issue is either;

DurandA commented 3 years ago

Selectively disabling the LTO for lib runtime and libc (https://github.com/DurandA/litex/commit/f15bcce1edc44d7607e1b8934a79b30b96c8963c) seems to solve all issues. Apparently, GCC optimizes out functions that are not in source without caring that it inserted calls to these functions (1) for arithmetic operations on architecture that do not support it natively and (2) during optimization process. See https://github.com/riscv/riscv-gcc/issues/207 and https://github.com/riscv/riscv-gnu-toolchain/issues/758.

Before doing any PR and proceeding with extensive testing, further validation is necessary (according to https://github.com/enjoy-digital/litex/commit/979f98ea3120dbba208c2b0b044ce37294653814):

mithro commented 3 years ago

FYI - @mateusz-holenko

mithro commented 3 years ago

@DurandA -- We may be able to mark specific functions in lib runtime / libc as used (even though not referenced)?

DurandA commented 3 years ago

I was not able to install a proper LM32 toolchain to check for potential issues. I compiled gcc-9.3.0 with ./configure --target=lm32-elf --enable-languages=c --disable-libssp --disable-libgcc --disable-libquadmath but these flags are not recognized: https://github.com/enjoy-digital/litex/blob/081d88342113afcaf420830064c8f68a51f0fb78/litex/soc/cores/cpu/lm32/core.py#L35-L38

If I remove them, I get a bunch of no such instruction errors. What is the recommanded way to install a working LM32 toolchain?

DurandA commented 3 years ago

@DurandA -- We may be able to mark specific functions in lib runtime / libc as used (even though not referenced)?

I am not sure if this is possible. My current approach would be to split libc into several files (i.e. at least memset.c and libc.c for others). I didn't investigate yet if GCC can generate calls to something else than memset during optimization. For lib runtime, I am not sure if we can do something clean as what is used depends on the architecture.

DurandA commented 3 years ago

What is the recommanded way to install a working LM32 toolchain?

@enjoy-digital Can you please point to the LM32 toolchain you are using? :slightly_smiling_face:

mithro commented 3 years ago

We (@timvideos / litex-buildenv, @antmicro and @symbiflow) use the toolchain from https://github.com/litex-hub/litex-conda-compilers/tree/master

ewenmcneill commented 3 years ago

@DurandA -- We may be able to mark specific functions in lib runtime / libc as used (even though not referenced)?

I am not sure if this is possible.

Some (old, because that's what the search engine found first) GCC documentation implies that the externally_visible symbol attribute will prevent it being discarded during LTO. And this seems to be a fairly common work around for symbols not properly detected as "required", eg, https://stackoverflow.com/questions/46304742/how-to-combine-lto-with-symbol-versioning/54045851#54045851 and https://github.com/arduino/Arduino/issues/660. Some people have specified the "used" attribute instead, which I guess has a similar effect.

The LLVM design document for LTO impiles it also looks for "externally visible" symbols, but I'm not sure if it respects the same attribute.

Hopefully there'd be relatively few such symbols that need marking like that. (The LM32 issues I had I think were more to do with Makefile complications that sort of enabled LTO but sort of didn't, for LM32, which then caused other things not to build due to confusion over compile/link flags; that part seems easier to resolve, even with just a "no LTO" off switch to try when there are compile/link problems.)

Ewen

DurandA commented 3 years ago

Some (old, because that's what the search engine found first) GCC documentation implies that the externally_visible symbol attribute will prevent it being discarded during LTO.

@ewenmcneill Thanks for mentioning it. Unfortunately, if I remove the -fno-lto flag for libc.c and add __attribute__((externally_visible)) to memset, then some undefined reference to `__mulsi3' are thrown, undepending on if I add __attribute__((externally_visible)) to __mulsi3 or if mulsi3.c is compiled with -fno-lto.

DurandA commented 3 years ago

I created a minimal repo with LTO to experiment with LTO behaviour and to be able to discuss about this with external people.

@enjoy-digital Why did you add mulsi3.c to the _libcompilerrt dir in https://github.com/enjoy-digital/litex/commit/17f6cb1f1740c39039a602e8a95798c50db4c2f6 and not to the pythondata-software-compiler_rt package? I am also looking for a minimal code snippet that will force GCC to use it on a rv32i target.

enjoy-digital commented 3 years ago

@DurandA: thanks for looking at this. Before using pythondata-software-compiler_rt we were using an external compiler_rt as a submodule (so were to avoid forking this local modification were made to libcompiler_rt). Avoiding this local mulsi3.c would indeed be better. I'll try to have a closer look at this in the next days.

ewenmcneill commented 3 years ago

then some undefined reference to `__mulsi3' are thrown

:-( It sounds like the "compiler replacements for missing instructions" (which that looks like -- signed multiply for integers), are at the biggest risk of LTO not realising they're needed until it's too late. Your approach of creating a minimal example to experiment with that particular bit seems like a good approach. Thanks for tackling it. Good luck!

Ewen

mithro commented 3 years ago

@ewenmcneill We really should be filing bugs against gcc / clang for this failure mode.

ewenmcneill commented 3 years ago

We really should be filing bugs against gcc / clang for this failure mode.

If someone comes up with a minimal reproducible example that shows that the compiler-replacements of mul/div/etc are not surviving LTO optimisation (ideally with glibc), on a particular supported CPU architecture (it seems like RV32I might be the best modern option without MUL/DIV) then I agree it'd be worth reporting upstream.

I did wonder if the GNU (GCC/Glibc) versions of MUL/DIV replacements were being marked in some different manner that made them survive LTO, but that doesn't seem to be the case (eg, https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/microblaze/mulsi3.S, https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/epiphany/mulsi3.c to pick random examples searching found). And maybe that's part of the problem....

Ewen

PS: For future reference, possible symbol attributes in Clang; it seems that "externally visible" isn't a supported Clang attribute, nor can I see "used". So I'm unclear how one tweaks functions to be retained over LTO in Clang :-(

DurandA commented 3 years ago

Here is a minimal example that will trigger both undefined reference to `memset' and undefined reference to `__mulsi3'.

A few remarks about it:

This example shows the behaviour of GCC pretty clearly: calls to libc or libgcc are not marked to be retained by the LTO. This is an issue when using -fno-builtin in conjunction with -flto. I think we should discuss about this on the GCC mailing list. Since RISC-V is not upstreamed yet, we might want to convert the example to a different platform (ARM?).

As to what we can do about it, I have a branch with LTO enabled that sets -fno-lto for (1) libc.c and (2) everything in libcompiler_rt. For (1) I can split libc.c in two files or more with only memset, memcpy, memmove, ... (should investigate on the full list) being compiled with -fno-lto. For (2) I think it would be a real mess to conditionally compile with -fno-lto according to the target so I would suggest to use -fno-lto for everything even if that makes larger binaries. We might want to measure the binary size increase due to this.

@mithro @enjoy-digital Are you interested in a PR enabling LTO with the above changes? If yes, please tell me how do you want to split libc and I will make a draft PR. I will need some help to properly test this. I tested these changes on a few platforms that caused problems in previous revisions but not with litex-buildenv. Also I did not try using LTO with LLVM/Clang yet.

ewenmcneill commented 3 years ago

Since RISC-V is not upstreamed yet, we might want to convert the example to a different platform (ARM?).

As far as I can remember/find, the ARM instruction set almost always includes at least some MUL (eg, 32x32) (since ARMv2 AFAICT), so it might not be the best choice for an upstream demo. (it looks like even Thumb normally includes a shorter MUL). If we could induce an implicit DIV, and have the same LTO issues, then ARM might work (as DIV is much less commonly implemented in ARM instruction sets than MUL).

It looks like MUL is optional in LM32 (Lattice Mico 32), and there seems to be a gcc flag for LM32 to indicate whether multiply instructions can be generated. As far as I can tell LM32 is still upstream in gcc. So maybe LM32 is still a better alternative? I'd still state that it was an issue found with RISC V first though, so they don't dismiss it as "just LM32"; RV32I is probably going to be a common "tiny CPU' target for the next 10 years. (FYI, from memory there's a gcc LM32 cross compiler in the same litex-hub compiler builds as the RISC V one; LM32 was the small soft CPU we were using for the RISC V soft CPU implementations became "production ready".)

Well done on all the other discoveries. It makes sense that the functions being inlined mostly hiding this LTO issue from the compiler/linker developers (and especially that they're mostly testing on CPU architectures which don't need MUL/DIV emulated :-) ).

Ewen

DurandA commented 3 years ago

Sorry for the lack of updates. I still plan to create a PR once we decided what is the best strategy to solve this issue.

gregdavill commented 2 years ago

Has this now been fixed with the introduction of picolibc? I see that -flto is a default enabled option with the current upstream code.

enjoy-digital commented 2 years ago

@gregdavill: Indeed, LTO has been enabled by default when integrating picolibc. We can now close this issue.

enjoy-digital commented 2 years ago

LTO still seems to cause random issue (ex https://github.com/enjoy-digital/litex/pull/1259) so has been disabled by default with https://github.com/enjoy-digital/litex/commit/3ab7eaa5f7afb4d432b5641d7761813d3274d25f. This will have to be properly tested/investigated before being re-enabled.