InBetweenNames / gentooLTO

A Gentoo Portage configuration for building with -O3, Graphite, and LTO optimizations
GNU General Public License v2.0
571 stars 96 forks source link

x86_64-pc-linux-gnu-ar issue with LTO and gcc10 #490

Open StefanSalewski opened 4 years ago

StefanSalewski commented 4 years ago

Six weeks ago I cloned my harddisk partition and started using GentooLTO and gcc10, gcc compiled with lto and pgo.

Until yesterday it was working not bad, but now emerge of basic packages like libinput or dev-libs/ico fail with messages like

x86_64-pc-linux-gnu-ar: creating uconvmsg/libuconvmsg.a
Two passes with the same argument (-amdgpu-argument-reg-usage-info) attempted to be registered!
config.status: creating extra/uconv/uconv.1
-- return status = 139
Error generating library file. Failed command: x86_64-pc-linux-gnu-ar r uconvmsg/libuconvmsg.a uconvmsg/uconvmsg_dat.o
Error generating assembly code for data.

The core message is "Two passes with the same argument (-amdgpu-argument-reg-usage-info) attempted to be registered!" and is generated by ar tool.

Ar is from binutils, and installing a different binutils version fails with the same message.

I tried switching back to gcc 9.2, but I got the same issue.

Currently I have no idea about the cause of the problem. May it be the ar program itself? I did

nuc /tmp/portage/dev-libs/libinput-1.14.3/work/libinput-1.14.3-build # x86_64-pc-linux-gnu-ar csrD liblibinput-util.a 'libinput-util@sta/src_libinput-util.c.o'
Two passes with the same argument (-amdgpu-argument-reg-usage-info) attempted to be registered!
Segmentation fault (core dumped)

Current binutils is

nuc /home/stefan # emerge -pv binutils

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R   ~] sys-devel/binutils-2.34:2.34::gentoo  USE="gold nls plugins -default-gold -doc -multitarget -static-libs -test" 0 KiB

I can not use a copy of arm on my original partition, as that is version 2.32 and it wants to load a matching lib.

But maybe the cause of the problem is not ar at all. I may switch back to gcc 9.2 and try to emerge all the packages which I emerge in the last weeks without LTO, maybe that will help.

StefanSalewski commented 4 years ago

I have indeed the feeling that my ar is broken.

stefan@nuc /tmp/www $ ls -lt
total 0
stefan@nuc /tmp/www $ echo "xxx 123" > xxx.o
stefan@nuc /tmp/www $ cat xxx.o 
xxx 123
stefan@nuc /tmp/www $ ar -q yyy.a xxx.o 
ar: creating yyy.a
Two passes with the same argument (-amdgpu-argument-reg-usage-info) attempted to be registered!
Segmentation fault (core dumped)

I am not really sure, as I have never used ar myself before. If it is broken, then first question is why, and next question is how I can fix it.

StefanSalewski commented 4 years ago

Well I just discovered that a gcc-ar exists, and that seems to work. So I hope I can link ar to gcc-ar to fix it.

stefan@nuc /tmp/www $ lt
total 0
stefan@nuc /tmp/www $ echo "xxx 123" > xxx.o
stefan@nuc /tmp/www $ cat xxx.o 
xxx 123
stefan@nuc /tmp/www $ gcc-ar -q yyy.a xxx.o 
/usr/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../x86_64-pc-linux-gnu/bin/ar: creating yyy.a
stefan@nuc /tmp/www $ ls -lt
total 8
-rw-r--r-- 1 stefan stefan 76 Mar 10 12:50 yyy.a
-rw-r--r-- 1 stefan stefan  8 Mar 10 12:49 xxx.o
stefan@nuc /tmp/www $ which gcc-ar
/usr/bin/gcc-ar
stefan@nuc /tmp/www $ ls -lt /usr/bin/gcc-ar
lrwxrwxrwx 1 root root 46 Mar  9 19:34 /usr/bin/gcc-ar -> /usr/x86_64-pc-linux-gnu/gcc-bin/10.0.1/gcc-ar
StefanSalewski commented 4 years ago

Also see

https://stackoverflow.com/questions/48777554/what-is-the-difference-between-ar-nm-and-gcc-ar-gcc-nm

StefanSalewski commented 4 years ago

I had to do the same for ranlib manually (maybe for other binutils tools too?)

/usr/bin # ls -lt

x86_64-pc-linux-gnu-ranlib -> x86_64-pc-linux-gnu-gcc-ranlib
x86_64-pc-linux-gnu-ar -> /usr/bin/gcc-ar

I guess that these links got broken somehow (not pointing to the gcc version) and it seems that gcc-config or eselect gcc do not fix the links when broken.

At least now it seems to work again, I was able to emerge net-misc/openssh again!

Well, the nm link is still wrong:

31 Feb 15 11:39 /usr/bin/x86_64-pc-linux-gnu-nm -> /usr/x86_64-pc-linux-gnu/bin/nm

So the accident happened on Feb 15 -- but I still wonder why.

StefanSalewski commented 4 years ago

Well, seems that the problem was and still is

# emerge -av binutils
[ebuild   R    ] sys-devel/binutils-2.33.1-r1:2.33::gentoo  USE="gold nls plugins -default-gold -doc -multitarget -static-libs -test" 0 KiB

which results again in

Mar 11 06:45  x86_64-pc-linux-gnu-ar -> /usr/x86_64-pc-linux-gnu/bin/ar

and ar stops working.

"eselect binutils set" does not fix the issue, it creates the links to the non gcc versions too.

Peter-Levine commented 4 years ago

I ran into this while building both clang and binutils. -amdgpu-argument-reg-usage-info appears to be an LLVM flag, presumably when LLVM is built with LLVM_TARGETS="AMDGPU". But the active toolchain was built using entirely GNU. When I unmerged llvm, all build problems disappeared. Maybe binutils or gcc components are somehow dynamically linking against llvm libraries?

Peter-Levine commented 4 years ago

Running the offending ar command in gdb shows that a function in /usr/lib64/binutils/x86_64-pc-linux-gnu/2.33.1/libbfd-2.33.1.gentoo-sys-devel-binutils-st.so is calling a function in /usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/../2.33.1/../lib/bfd-plugins/LLVMgold.so. I don't have gold set as my linker and nothing was built with llvm. Building with -fuse-ld=bfd has no effect.

StefanSalewski commented 4 years ago

Thank you very much for your investigations. I can not comment on the core of this issue as I do know not much about binutils and ar internals. But I am happy that my box is running well again after manually setting the ar link to gcc-ar.

Peter-Levine commented 4 years ago

The problem appears fixed in git HEAD with binutils-9999.

StefanSalewski commented 4 years ago

Great. Then I will close this issue in the next week.

Peter-Levine commented 4 years ago

Never mind. I spoke too soon. It popped up again. I unmerged llvm-10/clang-10/llvmgold-10 and have no problems with llvm-9/clang-9/llvmgold-9.

StefanSalewski commented 4 years ago

I have a new problem now, emerging sys-libs/glibc-2.30-r6 fails. Reason is a different ar call as

/usr/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../x86_64-pc-linux-gnu/bin/ar

which is

cd /usr/lib/gcc/x86_64-pc-linux-gnu/10.0.1/../../../../x86_64-pc-linux-gnu/bin/ nuc /usr/x86_64-pc-linux-gnu/bin # pwd /usr/x86_64-pc-linux-gnu/bin

ar -> /usr/x86_64-pc-linux-gnu/binutils-bin/2.33.1/ar

And fixing this link manually does not work, I think I get a loop of symlinks when I try to fix it.

I assume that gcc-ar is not an executable for it own, but calls this link too.

I think I have completely removed clang10, but that is not enough. I guess I have to reemerge some tools, maybe emerge binutils-9999? May that help? Or better reemerge binutils without LTO? It is a bit dangerous of course, I may get a situation where all is completely broken, and I would have to switch back to my backup partition without LTO.

StefanSalewski commented 4 years ago

emerge -av binutils-libs binutils

for version 2.34 does not fix the problem. But what is interesting is that ar is working fine when we give it --plugin argument:

stefan@nuc /tmp $ /usr/x86_64-pc-linux-gnu/binutils-bin/2.34/ar -q yyy.a xxx.o 
Two passes with the same argument (-amdgpu-argument-reg-usage-info) attempted to be registered!
Segmentation fault (core dumped)
stefan@nuc /tmp $ /usr/x86_64-pc-linux-gnu/binutils-bin/2.34/ar --plugin=/usr/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/liblto_plugin.so -q yyy.a xxx.o 
stefan@nuc /tmp $ 
StefanSalewski commented 4 years ago

Maybe related, and there is a suggested fix:

https://github.com/void-linux/void-packages/issues/18725

StefanSalewski commented 4 years ago

I think finally I found the cause for the real problem:

$ ls -lt /usr/x86_64-pc-linux-gnu/binutils-bin/lib/bfd-plugins
total 8
lrwxrwxrwx 1 root root 60 Mar 10 16:29 liblto_plugin.so -> /usr/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/liblto_plugin.so
lrwxrwxrwx 1 root root 41 Mar  9 13:49 LLVMgold.so -> ../../../../lib/llvm/10/lib64/LLVMgold.so

So on 9 MAR an failed attempt to install clang10 created a link in /usr/x86_64-pc-linux-gnu/binutils-bin/lib/bfd-plugins to LLVMgold.so of version 10, which was not working. And all my tries to uninstall clang10 have not reset that link. I have now manually reset it to clang9, and now I was able to install glibc again, and I hope my whole box works again.

Maybe a reinstall of clang9 and llvm9 would have fixed that automatically?

I was not aware that clang can break gcc, I have considered both indepantly in the past.

wolfwood commented 4 years ago

wow, nice find. definitely frustrating to have a cross package failure like that.

On Tue, Mar 24, 2020, 00:40 StefanSalewski notifications@github.com wrote:

I think finally I found the cause for the real problem:

$ ls -lt /usr/x86_64-pc-linux-gnu/binutils-bin/lib/bfd-plugins total 8 lrwxrwxrwx 1 root root 60 Mar 10 16:29 liblto_plugin.so -> /usr/libexec/gcc/x86_64-pc-linux-gnu/10.0.1/liblto_plugin.so lrwxrwxrwx 1 root root 41 Mar 9 13:49 LLVMgold.so -> ../../../../lib/llvm/10/lib64/LLVMgold.so

So on 9 MAR an failed attempt to install clang10 created a link in /usr/x86_64-pc-linux-gnu/binutils-bin/lib/bfd-plugins to LLVMgold.so of version 10, which was not working. And all my tries to uninstall clang10 have not reset that link. I have now manually reset it to clang9, and now I was able to install glibc again, and I hope my whole box works again.

Maybe a reinstall of clang9 and llvm9 would have fixed that automatically?

I was not aware that clang can break gcc, I have considered both indepantly in the past.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/InBetweenNames/gentooLTO/issues/490#issuecomment-603078600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAHXYRDR5AP4SOMMDYDYDDRJBPYFANCNFSM4LE3RGBQ .

Peter-Levine commented 4 years ago

Rebuilding llvm:10 without lto seems to have fixed the issue for me. Also, compiler-rt-sanitizers:10 won't build correctly with lto.

wolfwood commented 4 years ago

I'm also having the compiler-rt-sanitizers issue

On Wed, Mar 25, 2020, 14:40 Peter Levine notifications@github.com wrote:

Rebuilding llvm:10 without lto seems to have fixed the issue for me. Also, compiler-rt-sanitizers:10 won't build correctly with lto.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/InBetweenNames/gentooLTO/issues/490#issuecomment-604102305, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAHXYXB3FDE3VLVSOEUAN3RJJ24RANCNFSM4LE3RGBQ .

InBetweenNames commented 4 years ago

FYI: sys-devel/llvmgold is what installs that symlink. I remember having a concern about that in the past, too, because you can have multiple clang slots on your system but the LLVMgold.so plugin will just use the highest available slot, no functionality to switch it around unlike eselect gcc. And LLVM/Clang don't bother to guarantee ABI compatibility across different versions. Even LLVM IR itself is unstable between different LLVM versions -- learned that one the hard way.

Now, this issue seems to pertain to GCC 10 which I haven't tested outside of some sandboxes yet. GCC shouldn't be touching LLVMgold.so regardless, as that was only really used for LTO with Clang before they switched to lld. So, it sounds like before we migrate to GCC 10 we'll need to do some more extensive testing. I think GCC 10 is -fno-common by default, for example, and that could induce a lot of breakage. Lets leave this issue open so we can refer back to it when GCC 10 reaches a stable release.

wispoffates commented 4 years ago

Just a note I ran into this with GCC 9.3.0 in my recent attempt to go back to LTO. I fixed it by removing sys-devel/llvmgold. I think firefox pulled it in at some point but maybe no long requires it?

TheGreatMcPain commented 4 years ago

I'm running into this issue as well, but I was able to bypass it by removing AMDGPU from LLVM_TARGETS and re-building llvm/clang-10 using gcc.

I found that building llvm/clang-10 with gcc, and with LLVM_TARGETS=AMDGPU, causes

Two passes with the same argument (-amdgpu-argument-reg-usage-info) attempted to be
registered!

If I compile llvm/clang-10 with clang LLVM_TARGETS=AMDGPU will work, but causes other issues like www-client/firefox[pgo,lto,clang] having pgo profile merging failures.

Also, www-client/firefox[clang,lto] depends on llvmgold for some reason even though it uses lld for linking.

ekaats commented 4 years ago

For me all issues were fixed by installing =sys-devel/binutils-2.34-r1 (currently not keyworded)

That said, I still cannot properly compile Firefox but I don't think that is an LTO issue. Firefox compiles without pgo/clang but Segfaults immediately at runtime. With clang it does not even build. For now I am on firefox-bin and I'll try again after awhile. At least the rest of the system builds correctly with the latest binutils.

Hello71 commented 4 years ago

I fixed this problem by unmerging llvm and then rebuilding it. It didn't fix the segfault compiling compiler-rt-sanitizers though.

TheGreatMcPain commented 4 years ago

@Hello71 I think disabling LTO for llvm-10 fixes it.

Although, you can also use clang to compile llvm-10 which won't segfault with LTO, but will cause issues with pgo when compiling firefox. (At least this is what's happening on my system.)

Hello71 commented 4 years ago

sure, but this way you can keep lto.

elsandosgrande commented 4 years ago

If I am not mistaken, "[…] you can also use clang to compile llvm-10 which won't segfault with LTO, but will cause issues with pgo when compiling firefox. […]" also means that you keep link-time optimization, but through compiling this package with Clang instead of GCC.
From what I see, it also trades the segmentation fault when compiling compiler-rt-sanitizers in for being unable to compile Firefox with profile-guided optimizations.

TheGreatMcPain commented 4 years ago

@elsandosgrande I should of mentioned that i am using clang to compile firefox, since I think the pgo and lto useflags on firefox are not compatible without the clang useflag.

I'll see if the clang pgo problem also affects other packages, like python.

TheGreatMcPain commented 4 years ago

So I re-emerged llvm-10, and clang-10, using clang as the compiler using this inside of my `/etc/portage/env'. (I also disable ccache on all packages that use clang due to Gentoo bug 709454)

USE="clang"
CC="clang"
CXX="clang++"
CFLAGS="${CFLAGS} -fno-math-errno -fno-trapping-math -flto=thin"
CXXFLAGS="${CXXFLAGS} -fno-math-errno -fno-trapping-math -flto=thin"
LDFLAGS="-Wl,--lto-O2 -Wl,-O2 -Wl,--as-needed -fuse-ld=lld"
AR="llvm-ar"
NM="llvm-nm"
RANLIB="llvm-ranlib"

NOLDADD=1
USE_NONGNU=1

I was able to successfully emerge dev-lang/python:3.7 with pgo, and clang, using those same environment variables. I'm in the process of re-emerging firefox to see my issue got cleared up.

UPDATE: Firefox failed to build. Here's the build.log: build.log.tar.gz

TheGreatMcPain commented 4 years ago

I have re-emerged llvm-10 using clang as compiler and without LTO using these environment variables.

USE="clang"
CC="clang"
CXX="clang++"
CFLAGS="${CFLAGS} -fno-math-errno -fno-trapping-math"
CXXFLAGS="${CXXFLAGS} -fno-math-errno -fno-trapping-math"
LDFLAGS="-Wl,-O2 -Wl,--as-needed -fuse-ld=lld"
AR="llvm-ar"
NM="llvm-nm"
RANLIB="llvm-ranlib"

NOLDADD=1
USE_NONGNU=1

I'm currently waiting for Firefox to finish re-emerging which has been going for about 2 and a half hours. Normally it will fail before the first hour, so this is a good sign.

UPDATE: Yup, Firefox successfully emerged with the useflags lto,pgo,clang. At this point I feel like I should make a new issue for this.

BorisCarvajal commented 4 years ago

I've disabled -fipa-pta from llvm-10 and ar is no longer crashing loading the LLVMgold.so plugin.

Althorion commented 4 years ago

I stepped right into this. And I can’t rebuild the LLVM or Clang with Clang with said env variables, because even the Clang is broken.

Is there any hope for my system yet, or is it time to format?

StefanSalewski commented 4 years ago

BorisCarvajal , your hint works great for me.

My box recently tried to update to clang 10, and I got the error "Two passes with the same argument (-amdgpu-argument-reg-usage-info)" when building some tools like compiler-rt. Your tip fixed it:

$ grep llvm /etc/portage/package.cflags/ltoworkarounds.conf
sys-devel/llvm *FLAGS-="-fipa-pta"

Then rebuild llvm and after that the other packages like compiler-rt.

Althorion, in Mar my box was also totally broken, gcc and clang refuses to work. But I got gcc to work again by manually fixing some links as described at the top of this thread.

telans commented 4 years ago

@Althorion same as you. Can't rebuild clang or llvm at the moment. Did you end up getting it sorted? I'll give moving links around a go.

Althorion commented 4 years ago

@telans unfortunately no. I’ve been trying quite a lot of things and ended up with a system so broken, it couldn’t even shut down, so I saved my @world set, blasted the whole thing and build it anew.

telans commented 4 years ago

For me as least, all that was needed was emerge -C llvmgold & rebuilding llvm without -fipa-pta. Llvm pulls llvmgold back in after merging