Open nwf opened 2 years ago
Here's another variant of the same thing:
ld.lld: error: cannot open crtbeginS.o: No such file or directory
ld.lld: error: unable to find library -lgcc
ld.lld: error: unable to find library -lgcc
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
--- libprivateheimipcs.so.11.full ---
*** Failed target: libprivateheimipcs.so.11.full
*** Failed commands:
@${ECHO} building shared library ${SHLIB_NAME}
=> @true building shared library libprivateheimipcs.so.11
@rm -f ${SHLIB_NAME} ${SHLIB_LINK}
=> @rm -f libprivateheimipcs.so.11 libprivateheimipcs.so
${_LD:N${CCACHE_BIN}} ${LDFLAGS} ${SSP_CFLAGS} ${SOLINKOPTS} -o ${.TARGET} -Wl,-soname,${SONAME} ${SOBJS} ${LDADD}
=> /cheri/out/mainline/sdk/bin/ccache-clang -target riscv64-unknown-freebsd14.0 --sysroot=/cheri/build/mainline/cheribsd-riscv64-purecap-build/cheri/source/mainline/cheribsd/riscv.riscv64c/tmp -B/cheri/build/mainline/cheribsd-riscv64-purecap-build/cheri/source/mainline/cheribsd/riscv.riscv64c/tmp/usr/bin -m
arch=rv64imafdcxcheri -mabi=l64pc128d -Wl,-zrelro --ld-path=/cheri/out/mainline/sdk/bin/ld.lld -shared -Wl,-x -Wl,--fatal-warnings -Wl,--warn-shared-textrel -o libprivateheimipcs.so.11.full -Wl,-soname,libprivateheimipcs.so.11 server.pico common.pico -lheimbase -lroken -lpthread
*** [libprivateheimipcs.so.11.full] Error code 1
I don't see how there could possibly be a race, crtbeginS.o comes from lib/csu which is in _startup_libs and libgcc comes from lib/libcompiler_rt which is in _prereq_libs. Are you sure this isn't a ccache issue given you've hacked your local environment up to use it and it shows in both error reports, and you're the only one to have seen this?
I'm not sure that this isn't a ccache
issue, but AFAIK if ccache
is invoking ld
it's because it hasn't done the caching thing. FWIW, I suspect I'm also the only one building with -j160
and it's also possibly interesting that both reports are from the Kerberos-related part of the tree (in _prebuild_libs
from the looks of it)?
ETA: so far, every time this has happened, it's sufficed to just restart the build, suggesting that whatever is going on is a function of transitory state.
Those are in _prebuild_libs, which come strictly after _startup_libs and _prereq_libs; see the libraries target in Makefile.inc1 which is very definitely not parallelised. The error is likely not here but something went wrong with ccache earlier such that it didn't produce crtbeginS.o.
Hm. I hit this again,
ld.lld: error: cannot open crtendS.o: No such file or directory
ld.lld: error: cannot open crtn.o: No such file or directory
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
--- libgssapi_krb5.so.10.full ---
*** Failed target: libgssapi_krb5.so.10.full
*** Failed commands:
@${ECHO} building shared library ${SHLIB_NAME}
=> @true building shared library libgssapi_krb5.so.10
@rm -f ${SHLIB_NAME} ${SHLIB_LINK}
=> @rm -f libgssapi_krb5.so.10 libgssapi_krb5.so
${_LD:N${CCACHE_BIN}} ${LDFLAGS} ${SSP_CFLAGS} ${SOLINKOPTS} -o ${.TARGET} -Wl,-soname,${SONAME} ${SOBJS} ${LDADD}
=> /cheri/out/mainline/sdk/bin/ccache-clang -target riscv64-unknown-freebsd14.0 --sysroot=/cheri/build/cornucopia-modernize/cheribsd-riscv64-build/cheri/source/cornucopia-modernize/cheribsd/riscv.riscv64/tmp -B/cheri/build/cornucopia-modernize/cheribsd-riscv64-build/cheri/source/cornucopia-modernize/cheribsd/riscv.riscv64/tmp/usr/bin -Wl,-Bsymbolic -Wl,--no-undefined -march=rv64imafdc -mabi=lp64d -Wl,-zrelro --ld-path=/cheri/out/mainline/sdk/bin/ld.lld -fstack-protector-strong -shared -Wl,-x -Wl,--fatal-warnings -Wl,--warn-shared-textrel -o libgssapi_krb5.so.10.full -Wl,-soname,libgssapi_krb5.so.10 8003.pico accept_sec_context.pico acquire_cred.pico add_cred.pico address_to_krb5addr.pico aeap.pico arcfour.pico authorize_localname.pico canonicalize_name.pico ccache_name.pico cfx.pico compare_name.pico compat.pico context_time.pico copy_ccache.pico creds.pico decapsulate.pico delete_sec_context.pico display_name.pico display_status.pico duplicate_name.pico encapsulate.pico export_name.pico export_sec_context.pico external.pico get_mic.pico gkrb5_err.pico import_name.pico import_sec_context.pico indicate_mechs.pico init.pico init_sec_context.pico inquire_context.pico inquire_cred.pico inquire_cred_by_mech.pico inquire_cred_by_oid.pico inquire_mechs_for_name.pico inquire_names_for_mech.pico inquire_sec_context_by_oid.pico pname_to_uid.pico prefix.pico prf.pico process_context_token.pico release_buffer.pico release_cred.pico release_name.pico sequence.pico set_cred_option.pico set_sec_context_option.pico store_cred.pico ticket_flags.pico unwrap.pico verify_mic.pico wrap.pico gss_krb5.pico gss_oid.pico -lgssapi -lkrb5 -lcrypto -lroken -lasn1 -lcom_err
*** [libgssapi_krb5.so.10.full] Error code 1
and it's still in scrollback so I can look further down as bmake
bails out. Of course there's the path to building this target, with a little bit intermixed
bmake[5]: stopped in /cheri/source/cornucopia-modernize/cheribsd/kerberos5/lib/libgssapi_krb5
1 error
bmake[5]: stopped in /cheri/source/cornucopia-modernize/cheribsd/kerberos5/lib/libgssapi_krb5
--- all_subdir_kerberos5/lib/libgssapi_krb5 ---
bmake[4]: stopped in /cheri/source/cornucopia-modernize/cheribsd/kerberos5/lib
--- realinstall_subdir_lib/libngatm ---
bmake[4]: stopped in /cheri/source/cornucopia-modernize/cheribsd/lib
--- kerberos5/lib__L ---
bmake[3]: stopped in /cheri/source/cornucopia-modernize/cheribsd
as expected, and a bunch of things like
--- realinstall_subdir_lib/geom/raid ---
bmake[5]: stopped in /cheri/source/cornucopia-modernize/cheribsd/lib/geom
--- realinstall_subdir_lib/liblzma ---
bmake[4]: stopped in /cheri/source/cornucopia-modernize/cheribsd/lib
--- realinstall_subdir_lib/libdevctl ---
bmake[4]: stopped in /cheri/source/cornucopia-modernize/cheribsd/lib
but also, suspiciously,
--- realinstall_subdir_lib/csu/riscv ---
bmake[5]: stopped in /cheri/source/cornucopia-modernize/cheribsd/lib/csu
--- realinstall_subdir_lib/csu ---
bmake[4]: stopped in /cheri/source/cornucopia-modernize/cheribsd/lib
The end of the bmake
spew, FWIW, is
--- lib__L ---
bmake[3]: stopped in /cheri/source/cornucopia-modernize/cheribsd
--- libraries ---
bmake[2]: stopped in /cheri/source/cornucopia-modernize/cheribsd
Command exited with non-zero status 2
95.72user 24.83system 0:47.74elapsed 252%CPU (0avgtext+0avgdata 221888maxresident)k
3572154inputs+1826372outputs (19263major+616271minor)pagefaults 0swaps
--- _libraries ---
bmake[1]: stopped in /cheri/source/cornucopia-modernize/cheribsd
--- buildworld ---
(ETA: formatting) (ETA2: kerberos5/lib__L
)
Hm, _generic_libs= ${_cddl_lib} gnu/lib ${_kerberos5_lib} lib ${_secure_lib}
and lib/Makefile also builds csu etc; the latter has .WAITs in it but the former doesn't stop kerberos libs being rebuilt. I guess https://github.com/CTSRD-CHERI/cheribsd/commit/67a7d46cb7294c24c18d5a093196bc455fb50abf is rearing its head again, just with inputs that get statically linked, not just shared libraries :(
As a very crude workaround, then, would
_generic_libs= ${_cddl_lib} gnu/lib ${_kerberos5_lib} .WAIT lib .WAIT ${_secure_lib}
possibly do the right thing, minimizing the concurrent excitement around lib
's descents into subdirs? (Dare I ask why lib/Makefile
is descending into csu
at all given the special handling in Makefile.inc1
?)
That does seem like it should work. It might be that the a better answer is for lib/Makefile
to be informed which stage it's in to avoid reinstalling.
It also does seem that if we had renameat2()
with RENAME_EXCHANGE
in FreeBSD then we could alter install(1) to not be subject to races on reinstall.
Why RENAME_EXCHANGE
?
Why
RENAME_EXCHANGE
?
If you install the file in a tmp file and then use RENAME_EXCHANGE
do a swap and then delete the old one, all openers will always find a file and a complete one at that.
Isn't that true of rename
proper (he asks, knowing that POSIX is a beast quick to anger)? At least my reading of
If the link named by the new argument exists, it shall be removed and old renamed to new. In this case, a link named new shall remain visible to other threads throughout the renaming operation and refer either to the file referred to by new or old before the operation began.
(emphasis mine) from https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html suggests that renaming over an existing file doesn't have a hole where the target path doesn't exist?
Indeed, you are right. I wonder if we just need to use install -S
more aggressively. This might be a bit too much of a hammer, but I think this should work:
diff --git a/share/mk/bsd.lib.mk b/share/mk/bsd.lib.mk
index c98dec9045bf..970052164550 100644
--- a/share/mk/bsd.lib.mk
+++ b/share/mk/bsd.lib.mk
@@ -449,6 +449,7 @@ SHLINSTALLFLAGS+= -fschg
# time round, but for now using -S ensures the install is atomic and thus we
# never see a broken intermediate state, so use it even for NO_ROOT builds.
.if !defined(NO_SAFE_LIBINSTALL) #&& !defined(NO_ROOT)
+INSTALLFLAGS+= -S
SHLINSTALLFLAGS+= -S
SHLINSTALLSYMLINKFLAGS+= -S
.endif
GitHub needs a 🔨 reaction. :)
Oo, another instance of the same underlying cause cropping up elsewhere? I just saw
===> usr.sbin/nmtree (obj,all,install)
/cheri/source/mainline/cheribsd/tools/install.sh: line 85: /cheri/build/mainline/cheribsd-riscv64-hybrid-build/cheri/source/mainline/cheribsd/riscv.riscv64/tmp/legacy/usr/sbin/install: Permission denied
/cheri/source/mainline/cheribsd/tools/install.sh: line 85: exec: /cheri/build/mainline/cheribsd-riscv64-hybrid-build/cheri/source/mainline/cheribsd/riscv.riscv64/tmp/legacy/usr/sbin/install: cannot execute: Permission denied
--- installdirs-NLSDIR ---
*** Failed target: installdirs-NLSDIR
*** Failed commands:
@${ECHO} installing DIRS ${_alldirs_${:UNLSDIR}}
=> @true installing DIRS NLSDIR
${INSTALL} ${${:UNLSDIR}TAG_ARGS} -d -m ${${:UNLSDIR}_MODE} -o ${${:UNLSDIR}_OWN} -g ${${:UNLSDIR}_GRP} ${${:UNLSDIR}_FLAG} ${DESTDIR}${${:UNLSDIR}}
=> sh /cheri/source/mainline/cheribsd/tools/install.sh -T package=utilities -d -m 0755 -o root -g wheel /cheri/build/mainline/cheribsd-riscv64-hybrid-build/cheri/source/mainline/cheribsd/riscv.riscv64/tmp/legacy/usr/share/nls
*** [installdirs-NLSDIR] Error code 126
bmake[3]: stopped in /cheri/source/mainline/cheribsd/usr.bin/sort
--- realinstall_subdir_usr.sbin/zic/zic ---
bmake[3]: stopped in /cheri/source/mainline/cheribsd/usr.sbin/zic
--- _bootstrap-tools-usr.bin/grep ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
--- _bootstrap-tools-sbin/md5 ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
--- realinstall_subdir_usr.sbin/zic/zdump ---
bmake[3]: stopped in /cheri/source/mainline/cheribsd/usr.sbin/zic
--- _bootstrap-tools-usr.sbin/zic ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
1 error
bmake[3]: stopped in /cheri/source/mainline/cheribsd/usr.bin/sort
--- _bootstrap-tools-usr.bin/sort ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
--- _bootstrap-tools-usr.bin/mandoc ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
--- _bootstrap-tools-tools/build/bootstrap-m4 ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
--- _bootstrap-tools-usr.sbin/nmtree ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
--- _bootstrap-tools-libexec/flua ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
--- _bootstrap-tools-lib/libzstd ---
bmake[2]: stopped in /cheri/source/mainline/cheribsd
Command exited with non-zero status 2
216.25user 72.05system 0:35.58elapsed 810%CPU (0avgtext+0avgdata 223168maxresident)k
1911606inputs+169967outputs (97091major+1158097minor)pagefaults 0swaps
--- _bootstrap-tools ---
bmake[1]: stopped in /cheri/source/mainline/cheribsd
--- buildworld ---
with
/cheri/source/mainline/cheribsd/usr.bin/xinstall/xinstall.c:813:60: warning: unused parameter 'fset' [-Wunused-parameter]
install(const char *from_name, const char *to_name, u_long fset, u_int flags)
^
/cheri/source/mainline/cheribsd/usr.bin/xinstall/xinstall.c:1237:59: warning: unused parameter 'sbp' [-Wunused-parameter]
right before it. Looks like we're clobbering the install program in parallel with someone trying to install. This is without the -S
suggestion above; the failing build was part of a larger script that had been running for a long while.
Hmm, so my suggestion above won't fix the install
issue because it will only effect non-shared libraries (shared libraries already use -S in CheriBSD). It looks like the easy fix would be to define PRECIOUSPROG
and NO_FSCHG
in usr.bin/xinstall/Makefile
.
I wondered a bit if we should just always be using -S. I applied a patch to xinstall.c to just enable it all the time and while the difference is measurable, it's pretty small. Here's ministat output for wall clock time doing cheribuild cheribsd-morello-purecap
where make is invoked with -j40. This is on a previouly built tree so we're mostly (pointlessly) reinstalling things:
x default.stats
+ safecopy.stats
+------------------------------------------------------------------------------+
| x xx x + x + + + +|
||_______M_________A_________________| |________________A__M_____________| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 143.22 144.579 143.394 143.6904 0.55837649
+ 5 144.112 145.534 145.071 144.9786 0.52705484
Difference at 95.0% confidence
1.2882 +/- 0.791849
0.896511% +/- 0.553698%
(Student's t, pooled s = 0.542942)
Both user and sys time have no significant difference. I need to do a bit more testing, but I tempted to just make -S a no-op.
Since @dch asked, the patch is:
diff --git a/usr.bin/xinstall/xinstall.c b/usr.bin/xinstall/xinstall.c
index 05b1444506db..43b11d1627db 100644
--- a/usr.bin/xinstall/xinstall.c
+++ b/usr.bin/xinstall/xinstall.c
@@ -121,7 +121,8 @@ extern char **environ;
static gid_t gid;
static uid_t uid;
static int dobackup, docompare, dodir, dolink, dopreserve, dostrip, dounpriv,
- safecopy, verbose;
+ verbose;
+static int safecopy = 1;
static int haveopt_f, haveopt_g, haveopt_m, haveopt_o;
static mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
static FILE *metafp;
Smells like a missing dependency or
.WAIT
somewhere: