Closed ThinkOpenly closed 3 years ago
@rff found the same basic problem, and a different workaround: tell the loader to ignore the cache.
$ /opt/at-next-14.0-0-alpha/lib64/ld64.so.2 --inhibit-cache ./conftest
$
The search order is obvious the factor here.
$ LD_DEBUG=libs ./conftest.ko
91811: find library=libc.so.6 [0]; searching
91811: search cache=/opt/at-next-14.0-0-alpha/etc/ld.so.cache
91811: trying file=/lib64/power9/libc.so.6
91811:
Segmentation fault (core dumped)
$ LD_DEBUG=libs ./conftest.ok
91840: find library=libc.so.6 [0]; searching
91840: search path=/opt/at-next-14.0-0-alpha/lib64/tls/power9/altivec/dfp:/opt/at-next-14.0-0-alpha/lib64/tls/power9/altivec:/opt/at-next-14.0-0-alpha/lib64/tls/power9/dfp:/opt/at-next-14.0-0-alpha/lib64/tls/power9:/opt/at-next-14.0-0-alpha/lib64/tls/altivec/dfp:/opt/at-next-14.0-0-alpha/lib64/tls/altivec:/opt/at-next-14.0-0-alpha/lib64/tls/dfp:/opt/at-next-14.0-0-alpha/lib64/tls:/opt/at-next-14.0-0-alpha/lib64/power9/altivec/dfp:/opt/at-next-14.0-0-alpha/lib64/power9/altivec:/opt/at-next-14.0-0-alpha/lib64/power9/dfp:/opt/at-next-14.0-0-alpha/lib64/power9:/opt/at-next-14.0-0-alpha/lib64/altivec/dfp:/opt/at-next-14.0-0-alpha/lib64/altivec:/opt/at-next-14.0-0-alpha/lib64/dfp:/opt/at-next-14.0-0-alpha/lib64 (system search path)
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/power9/altivec/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/power9/altivec/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/power9/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/power9/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/altivec/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/altivec/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/tls/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/power9/altivec/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/power9/altivec/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/power9/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/power9/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/altivec/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/altivec/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/dfp/libc.so.6
91840: trying file=/opt/at-next-14.0-0-alpha/lib64/libc.so.6
91840:
91840:
91840: calling init: /opt/at-next-14.0-0-alpha/lib64/libc.so.6
91840:
91840:
91840: initialize program: ./conftest.ok
91840:
91840:
91840: transferring control: ./conftest.ok
91840:
91840:
91840: calling fini: ./conftest.ok [0]
91840:
$
Using -rpath
adds an RPATH
field in the executable (edited for brevity):
$ diff <(readelf -d ./conftest) <(readelf -d ./conftest.ok)
> 0x000000000000000f (RPATH) Library rpath: [/opt/at-next-14.0-0-alpha/lib64]
The dependency with or without -rpath
is the same:
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
man ld.so
shows the dependencies are searched:
DT_RPATH
, unlessDT_RUNPATH
present (I'm not 100% sure to which the above RPATH
entry maps)LD_LIBRARY_PATH
(not in use here)DT_RUNPATH
...so the only intrinsic attribute that will override the cache is -rpath
.
But, why is this not a problem on other operating systems? More to come...
On RHEL 8.1:
$ /opt/at-next-14.0-0-alpha/sbin/ldconfig -p | grep libc.so
libc.so.6 (libc6,64bit, hwcap: 0x0000400000000000, OS ABI: Linux 3.10.0) => /lib64/power9/libc.so.6
libc.so.6 (libc6,64bit, OS ABI: Linux 4.18.0) => /opt/at-next-14.0-0-alpha/lib64/libc.so.6
libc.so.6 (libc6,64bit, OS ABI: Linux 3.10.0) => /lib64/libc.so.6
On Ubuntu 18.04.3:
$ /opt/at-next-14.0-0-alpha/sbin/ldconfig -p | grep libc.so
libc.so.6 (libc6,64bit, OS ABI: Linux 4.15.0) => /opt/at-next-14.0-0-alpha/lib64/libc.so.6
libc.so.6 (libc6,64bit, OS ABI: Linux 3.10.0) => /lib/powerpc64le-linux-gnu/libc.so.6
In the search order above, (4) searches the cache, and notes that "shared objects installed in hardware capability directories [...] are preferred to other shared objects."). So, due to the presence of /lib64/power9/libc.so.6
on the RHEL 8.1 system, it is chosen first.
AT13 also fails for a similar reason, but differently. From at13.0-1-rc2.redhat-8_ppc64le_ppc64le/logs/_gcc_2-3_standard_buildf-06_make.log
:
build/genautomata: /lib64/power9/libm.so.6: version `GLIBC_2.29' not found (required by build/genautomata)
Trivial executables do seem to work, allowing the configure
steps to succeed:
$ echo 'int main(){}' > conftest.c
$ /opt/at13.0-1-rc2/bin/gcc -o conftest -g -Wl,--dynamic-linker=/opt/at13.0-1-rc2/lib64/ld64.so.2 conftest.c
$ ./conftest
$ ldd ./conftest
linux-vdso64.so.1 (0x00007fff8b330000)
libc.so.6 => /lib64/power9/libc.so.6 (0x00007fff8b110000)
/opt/at13.0-1-rc2/lib64/ld64.so.2 => /lib64/ld64.so.2 (0x00007fff8b350000)
$ LD_DEBUG=libs ./conftest
36832: find library=libc.so.6 [0]; searching
36832: search cache=/opt/at13.0-1-rc2/etc/ld.so.cache
36832: trying file=/lib64/power9/libc.so.6
36832:
36832:
36832: calling init: /lib64/power9/libc.so.6
36832:
36832:
36832: initialize program: ./conftest
36832:
36832:
36832: transferring control: ./conftest
36832:
36832:
36832: calling fini: ./conftest [0]
36832:
Using
-rpath
adds anRPATH
field in the executable (edited for brevity):
GCC stage 2 is the first build that uses --with-advance-toolchain
.
This is supposed to add the rpath to the built files: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config.gcc;h=ae5a845fccea2a2c0d8ba2275972f664a2a9a26e;hb=HEAD#l4975
Why isn't it working?
Looks like that flag is acted upon in ...gcc/gcc/config.gcc
, but it also looks like stage2 never gets there. Trying (in vain so far) to understand the GCC bootstrap procedure...
I've found it challenging to fully understand the process.
As Tulio said, --with-advance-toolchain
is used for GCC stage 2 build. However, it is only applied in gcc/config.gcc
(by atcfg_pre_hacks
and atcfg_configure
in configs/13.0/packages/gcc/stage_2
). This may be too late, as other modules are built before gcc
module. Here, intl
fails:
make[3]: Entering directory '/home/pc/at13/at14.0-0-alpha.redhat-8_ppc64le_ppc64le/builds/gcc_2'
Configuring stage 1 in ./intl
configure: creating cache ./config.cache
checking for powerpc64le-linux-gnu-gcc... /opt/at-next-14.0-0-alpha/bin/gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... configure: error: in `/home/pc/at13/at14.0-0-alpha.redhat-8_ppc64le_ppc64le/builds/gcc_2/intl':
configure: error: cannot run C compiled programs.
This started to happen after adding --with-stage1-ldflags="-Wl,--dynamic-linker=${ldso}"
which forces the test program to use the AT loader at GCC stage 1. This parameter is not wrong, it was added in order to avoid mixing AT headers with the system libraries. More specifically, glibc header and libraries.
What is happening?
Early in gcc_2, ${AT_DEST}/bin/gcc
is executed with mixed glibc parts: the loader comes from AT, but libc comes from the system.
That happens because the glibc loader favors CPU-optimized over general builds, e.g. /lib64/power9/libc.so.6
instead of ${AT_DEST}/lib64/libc.so.6
.
So, this error appears only when running a distro that provides CPU-optimized glibc for the processor that you're running, e.g. that happens on RHEL 8 on POWER9, but not on RHEL8 on POWER8.
I believe there are at least 4 possible solutions:
Work around GCC stage 1 in gcc_2 using -rpath This is the solution from PR #1299. Nothing new here.
Work around glibc_1
If glibc_1 provides symlinks the symlink ${AT_DEST}/lib64/power9
, ldconfig_1 will prefer the symlinks from AT, creating a cache that will match with what is expected in gcc_2.
Container-based build Run gcc_2 inside a container with only the files that are strictly needed. This is what most distros do, but is much more complex to implement, requiring to choose all the required files/packages in order to not disable an important feature by mistake.
Work around GCC stage 1 in gcc_2 by using system's headers Let GCC stage 1 in gcc_2 use the system's headers. It may be necessary to copy mpc, mpfr and gmp to a another place in order to avoid picking the headers from other AT-provided libraries. Libraries built with the system's headers may disable features.
Notice that we may need to adopt different solutions for AT next and AT <= 13.0 in order to guarantee their stability.
I echo @mscastanho : (2) is not bad, and arguably works around a bug in the loader (if pathA is before pathB in the loader path, should the loader really accept pathB/opt/libc.so
before pathA/libc.so
?)
(3) would be a nice longer term solution, but is likely a fair bit of work.
I looked at whether pull #1299 solves this issue. While it does help with older versions of AT, it does not help with current version (15). I do note that GCC stage 2 was removed (commit b1f57fa69b49064c30dc39e0db71398962267aea), which is the stage in which that pull request made changes, but making similar changes in stage 3 did not help. I'm still investigating, but GCC stage 3 fails in the configure step running:
powerpc64le-linux-gnu-gcc -o conftest conftest.c
Note that there are no flags used. Results:
$ ./conftest
./conftest: /lib64/power9/libc.so.6: version `GLIBC_2.34' not found (required by ./conftest)
Note that it seeks to run an AT-linked executable with the system libc.
OK, with a few symlinks added at the end of GCC stage 1, the build completes successfully! Here's the (minimal) patch:
diff --git a/configs/11.0/packages/gcc/stage_1 b/configs/11.0/packages/gcc/stage_1
index 2df98ea19e2c..c36c68fe23ab 100644
--- a/configs/11.0/packages/gcc/stage_1
+++ b/configs/11.0/packages/gcc/stage_1
@@ -207,0 +208,5 @@ atcfg_post_install() {
+
+ mkdir -p "${at_dest}/lib64/power9"
+ ln -s "../libc.so.6" "${at_dest}/lib64/power9/libc.so.6"
+ ln -s "../libm.so.6" "${at_dest}/lib64/power9/libm.so.6"
+
The next question is what is the best fix? The above is minimal and only works iff there is exactly one "optimized" subdirectory on the system. Shall I go find all of the files in the system library directories which contain a library which matches any library already built for AT / GCC stage 1, and create symlinks for all of them? (Unknown to me at the moment: do I need to back at some point and clean all of that up? With the patch above, it did not seem that removing the symlinks was needed.)
Shall I go find all of the files in the system library directories which contain a library which matches any library already built for AT / GCC stage 1, and create symlinks for all of them?
@ThinkOpenly Thinking in the long term and considering future processors, I do think it's ideal if all libraries from glibc have their own symlink for each entry in BUILD_ACTIVE_MULTILIBS
.
I wonder if it works if we just symlink the processor directory, e.g. ln -s "${at_dest}/lib64" "${at_dest}/lib64/power9"
do I need to back at some point and clean all of that up?
Yes, you do. Otherwise, the processor-optimized build from AT will overwrite the default build, e.g. a P9 libc.so.6 will be placed in ${at_dest}/lib64/
causing issues when running on P8.
Notice that we don't have to modify the contents in ${at_dest}/lib64
for this, it might be easier to revert your work if you benefit from tmp/
in the build directory and just add an extra file to ${at_dest}/ld.so.conf.d/
pointing to the directory you created. In the end, you can just remove this file later.
This might also help to avoid conflicting writes to processor-optimized directories.
Shall I go find all of the files in the system library directories which contain a library which matches any library already built for AT / GCC stage 1, and create symlinks for all of them?
@ThinkOpenly Thinking in the long term and considering future processors, I do think it's ideal if all libraries from glibc have their own symlink for each entry in
BUILD_ACTIVE_MULTILIBS
.
Is BUILD_ACTIVE_MULTILIBS
guaranteed to be a superset of whatever the AT loader looks for?
I wonder if it works if we just symlink the processor directory, e.g.
ln -s "${at_dest}/lib64" "${at_dest}/lib64/power9"
I will try that.
do I need to back at some point and clean all of that up?
Yes, you do. Otherwise, the processor-optimized build from AT will overwrite the default build, e.g. a P9 libc.so.6 will be placed in
${at_dest}/lib64/
causing issues when running on P8.
Indeed. It's a bit ugly to have cross-stage dependencies like that... hmm. I wonder when the clean up step should be inserted?
Notice that we don't have to modify the contents in
${at_dest}/lib64
for this, it might be easier to revert your work if you benefit fromtmp/
in the build directory and just add an extra file to${at_dest}/ld.so.conf.d/
pointing to the directory you created. In the end, you can just remove this file later. This might also help to avoid conflicting writes to processor-optimized directories.
Modifications there require a subsequent ldconfig
step, correct?
Might this also prevent the need for a clean-up step?
Is BUILD_ACTIVE_MULTILIBS guaranteed to be a superset of whatever the AT loader looks for?
@ThinkOpenly No, but usually AT is ahead of the distros, e.g. we started building optimized libraries for P10 1 year ago and distros haven't adopted this yet.
I wonder when the clean up step should be inserted?
We don't have to hurry, but it has to happen before the last ldconfig execution (ldconfig_2).
Modifications there require a subsequent ldconfig step, correct?
Correct. You need one after creation and another after removal. Luckily we ldconfig_1 executes after glibc_1 and ldconfig_2 executes after glibc_2. So, I think we could use the post-install hacks in both glibc steps to take care of this.
Might this also prevent the need for a clean-up step?
I'm not sure I understand your point. You may not need to remove the files from tmp/
, but it's still very important to remove the file from ${at_dest}/ld.so.conf.d/
.
Shall I go find all of the files in the system library directories which contain a library which matches any library already built for AT / GCC stage 1, and create symlinks for all of them?
@ThinkOpenly Thinking in the long term and considering future processors, I do think it's ideal if all libraries from glibc have their own symlink for each entry in
BUILD_ACTIVE_MULTILIBS
. I wonder if it works if we just symlink the processor directory, e.g.ln -s "${at_dest}/lib64" "${at_dest}/lib64/power9"
This latter suggestion, unfortunately, does not work. ldconfig
checks the inode number for all directories added to the search path, and will not add the directory if the inode number matches. (I understand why, but this seems pretty close to being a bug given the use-case here.) I will try actual subdirectories containing symlinks for all of the files in the parent.
I am making progress with the approach suggested to add a new directory to ld.so.conf
which supercedes the system libraries by having arch-specific directories. I populated these directories by copying files from the AT lib64
directory. The build completes successfully, but FVTR fails (a manually created summary follows):
ck_binaries
readelf failed: 1 flags: /home/pc/opt8/at15.0-0-alpha/bin/readelf: symbol lookup error: /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
ck_ldds
readelf: symbol lookup error: /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
ck_provides User error, running via remake.sh fvtr
eliminates this issue.
ck_provides: power9: Unable to find packages for at15.0-alpha.
ck_requires
ck_requires: power9: Inconsistencies found on package advance-toolchain-at15.0-alpha-devel-15.0-0.ppc64le.rpm requirements.
ck_requires: power9: advance-toolchain-at15.0-alpha-devel-15.0-0.ppc64le.rpm has the following extra requirements:
libbabeltrace-ctf.so.1()(64bit)
libbabeltrace.so.1()(64bit)
libdebuginfod.so.1()(64bit)
libisl.so.15()(64bit)
liblzma.so.5()(64bit)
libncursesw.so.6()(64bit)
libtinfo.so.6()(64bit)
libzstd.so.1()(64bit)
ck_requires: power9: Inconsistencies found on package advance-toolchain-at15.0-alpha-mcore-libs-15.0-0.ppc64le.rpm requirements.
ck_requires: power9: advance-toolchain-at15.0-alpha-mcore-libs-15.0-0.ppc64le.rpm has the following extra requirements:
libbz2.so.1()(64bit)
liblzma.so.5()(64bit)
libzstd.so.1()(64bit)
ck_requires: power9: Inconsistencies found on package advance-toolchain-at15.0-alpha-runtime-15.0-0.ppc64le.rpm requirements.
ck_requires: power9: advance-toolchain-at15.0-alpha-runtime-15.0-0.ppc64le.rpm has the following extra requirements:
libbz2.so.1()(64bit)
libffi.so.6()(64bit)
libisl.so.15()(64bit)
liblzma.so.5()(64bit)
libncursesw.so.6()(64bit)
libpanelw.so.6()(64bit)
libreadline.so.7()(64bit)
libsqlite3.so.0()(64bit)
libtcl8.6.so()(64bit)
libtinfo.so.6()(64bit)
libtk8.6.so()(64bit)
libuuid.so.1()(64bit)
libX11.so.6()(64bit)
libzstd.so.1()(64bit)
python : This works if run manually as /path/to/at15.0-0-alpha/bin/python3 /path/to/at15.0-0-alpha/lib64/python3.9/test/regrtest.py test_ctypes
0:00:12 load avg: 19.38 [ 91/395/1] test_ctypes failed
test test_ctypes failed -- multiple errors occurred; run in verbose mode for details
systemtap : fails due to the same readelf
issue seen above
systemtap: power9: Could not find systemtap probes in /home/pc/opt8/at15.0-0-alpha/lib64/libc-2.33.9000.so
timezone User error, running via remake.sh fvtr
eliminates this issue.
timezone: power9: System timezone -0500
timezone: power9: AT timezone +0000
timezone: power9: AT timezone is different from system's timezone. [FAIL]
investigating...
The AT-built readelf
command is failing:
/home/pc/opt8/at15.0-0-alpha/bin/readelf: symbol lookup error: /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
The dependency list from ldd
is:
file=libdebuginfod.so.1 [0]; needed by /home/pc/opt8/at15.0-0-alpha/bin/readelf [0]
file=libcurl.so.4 [0]; needed by /lib64/libdebuginfod.so.1 [0]
file=libssl.so.1.1 [0]; needed by /lib64/libcurl.so.4 [0]
trying file=/home/pc/opt8/at15.0-0-alpha/lib64/power9/libssl.so.1.1
file=libk5crypto.so.3 [0]; needed by /lib64/libcurl.so.4 [0]
AT contains OpenSSL 1.1.1k, and 'b' is before 'k', last I checked.
So, does this imply that AT might need to include an updated Kerberos libraries package?
Possibly instructive that openssl and krb5-libs are tightly bound: https://unix.stackexchange.com/questions/594618/git-push-error-undefined-symbol-evp-kdf-ctrl-version-openssl-1-1-1b
Even more interesting, in that the "EVP_KDF" support is apparently a Red Hat downstream add: https://github.com/openssl/openssl/issues/11471
@ThinkOpenly That's issue #1969 .
Ugh. So, are the choices:
While there are certainly problems with (1), a problem with not doing (1) is that the libssl in AT does not provide the complete API as the one in RHEL8. :-/
Why is readelf depending on libk5crypto.so.3? If this isn't an important feature, we could remove it. But we would continue having the same issue as reported in #1969 .
Why is readelf depending on libk5crypto.so.3?
The dependency chain is above, repeated here:
readelf
-> libdebuginfod
-> libcurl
-> {libssl
and libk5crypto
}
Why libdebuginfod
depends on libcurl
is an interesting question.
If this isn't an important feature, we could remove it. But we would continue having the same issue as reported in #1969 .
So, it's not something we can remove, because both dependencies are in libcurl
, not part of AT.
Any idea if I need to fix something to address the issues reported by the ck_requires
test?
Other than that, the other FVTR failures are apparently limited to the issue reported in #1969.
The AT RPMs are built now. Shall I submit a pull request for the changes I have in hand which seems to address this issue, then we can pivot to #1969 separately?
I've now tried to build on two RHEL 8.1 systems, and they both fail identically:
at14.0-0-alpha.redhat-8_ppc64le_ppc64le/logs/_gcc_2-3_standard_buildf-06_make.log
:at14.0-0-alpha.redhat-8_ppc64le_ppc64le/builds/gcc_2/libiberty/config.log
:And indeed, the newly built linker does not produce working executables:
It seems to be caused by using the newly built loader, the newly built exectable, and the system libc:
Manually adding an
rpath
which finds the newly built libc instead produces a working executable: