Closed jgouly closed 2 years ago
cc @MaskRay
It looks like the kernel change that exposed this is commit 13150bc5416f ("module: use hidden visibility for weak symbol references"), at least according to https://lore.kernel.org/202202040710.uRW0o2ty-lkp@intel.com/, which has the same error basically.
I have no knowledge of LLD, I've just been hacking around and trying to understand the issue.
diff --git a/lld/ELF/Symbols.cpp b/lld/ELF/Symbols.cpp
index b087dd1823fd..932bbcaba0cb 100644
--- a/lld/ELF/Symbols.cpp
+++ b/lld/ELF/Symbols.cpp
@@ -245,6 +245,8 @@ void Symbol::extract() const {
}
uint8_t Symbol::computeBinding() const {
+ if (visibility == STV_HIDDEN)
+ return binding;
if ((visibility != STV_DEFAULT && visibility != STV_PROTECTED) ||
versionId == VER_NDX_LOCAL)
return STB_LOCAL;
This patch fixes the issue for me. The Symbol originally has STB_WEAK
and STV_HIDDEN
, computeBinding()
then overrides that to STB_LOCAL
.
Or alternative, in Linux:
diff --git a/include/linux/module.h b/include/linux/module.h
index 1e135fd5c..10aa3692c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -728,7 +728,7 @@ static inline bool within_module(unsigned long addr, const struct module *mod)
}
/* Get/put a kernel symbol (calls should be symmetric) */
-#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
+#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("protected"))); &(x); })
#define symbol_put(x) do { } while (0)
#define symbol_put_addr(x) do { } while (0)
I don't know the difference between hidden and protected (yet).
Sorry for the breakage related to hidden undefined weak symbols. I think this is a regression similar to another for ppc64 https://reviews.llvm.org/D119787 . There wasn't sufficient hidden undefined weak symbols tests and without interesting usage (like this one from Linux kernel) I'd be difficult to catch such problems beforehand.
There was a minor optimization related to reducing the number of computeBinding
calls. Looks like it failed to account for some isUndefinedWeak
usage which should be changed to isUndefined
first.
I think the eventual fix should look like:
diff --git i/lld/ELF/Arch/AArch64.cpp w/lld/ELF/Arch/AArch64.cpp
index aa8550f70400..aca5300a33a3 100644
--- i/lld/ELF/Arch/AArch64.cpp
+++ w/lld/ELF/Arch/AArch64.cpp
@@ -257,7 +257,7 @@ bool AArch64::needsThunk(RelExpr expr, RelType type, const InputFile *file,
int64_t a) const {
// If s is an undefined weak symbol and does not have a PLT entry then it
// will be resolved as a branch to the next instruction.
- if (s.isUndefWeak() && !s.isInPlt())
+ if (s.isUndefined() && !s.isInPlt())
return false;
// ELF for the ARM 64-bit architecture, section Call and Jump relocations
// only permits range extension thunks for R_AARCH64_CALL26 and
diff --git i/lld/ELF/InputSection.cpp w/lld/ELF/InputSection.cpp
index 9fe9cea0426a..455e3483f696 100644
--- i/lld/ELF/InputSection.cpp
+++ w/lld/ELF/InputSection.cpp
@@ -729,7 +729,7 @@ uint64_t InputSectionBase::getRelocTargetVA(const InputFile *file, RelType type,
if (expr == R_ARM_PCA)
// Some PC relative ARM (Thumb) relocations align down the place.
p = p & 0xfffffffc;
- if (sym.isUndefWeak()) {
+ if (sym.isUndefined()) {
// On ARM and AArch64 a branch to an undefined weak resolves to the next
// instruction, otherwise the place. On RISCV, resolve an undefined weak
// to the same instruction to cause an infinite loop (making the user
But I'd need to prepare some tests and verify that in a large code base.
Side note: I'd suggest avoid weak references if you can: https://maskray.me/blog/2021-04-25-weak-symbol#weak-references
Many cases can be emulated by weak definitions. Weak references require very careful cooperation among compiler/linker/human assembly writer ;-)
@MaskRay Thanks for looking into this.
I tried the patch from your comment, and it does fix the issue for me.
@jgouly Thanks for the quick verification! Created a release/14.x backport request as shown above :)
Building with the attached config fails with:
I tried with multiple Linux kernel versions 5.18-rc2, 5.17, 5.16, 5.14.
I built LLVM llvmorg-13.0.0, that worked, and llvmorg-14.0.0 failed.
I have hit this when compiling on my M1 machine. I will try to reproduce it on another machine too, and bisect between the LLVM versions. config.txt
Disabling CONFIG_KVM seems to also "fix" the error.