ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

relocation R_AARCH64_ADR_PREL_PG_HI21 out of range #1624

Closed jgouly closed 2 years ago

jgouly commented 2 years ago

Building with the attached config fails with:

ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 140737215234048 is not in [-4294967296, 4294967295]
ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 140737215234048 is not in [-4294967296, 4294967295]
ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 140737215234048 is not in [-4294967296, 4294967295]
ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 140737215234048 is not in [-4294967296, 4294967295]
ld.lld: error: relocation R_AARCH64_ADR_PREL_PG_HI21 out of range: 140737215234048 is not in [-4294967296, 4294967295]

I tried with multiple Linux kernel versions 5.18-rc2, 5.17, 5.16, 5.14.

I built LLVM llvmorg-13.0.0, that worked, and llvmorg-14.0.0 failed.

I have hit this when compiling on my M1 machine. I will try to reproduce it on another machine too, and bisect between the LLVM versions. config.txt

Disabling CONFIG_KVM seems to also "fix" the error.

jgouly commented 2 years ago

I bisected this to: https://github.com/llvm/llvm-project/commit/aabe901d57d6df4cd2786163359a7b2a7aae8c32

nathanchance commented 2 years ago

cc @MaskRay

It looks like the kernel change that exposed this is commit 13150bc5416f ("module: use hidden visibility for weak symbol references"), at least according to https://lore.kernel.org/202202040710.uRW0o2ty-lkp@intel.com/, which has the same error basically.

jgouly commented 2 years ago

I have no knowledge of LLD, I've just been hacking around and trying to understand the issue.

diff --git a/lld/ELF/Symbols.cpp b/lld/ELF/Symbols.cpp
index b087dd1823fd..932bbcaba0cb 100644
--- a/lld/ELF/Symbols.cpp
+++ b/lld/ELF/Symbols.cpp
@@ -245,6 +245,8 @@ void Symbol::extract() const {
 }

 uint8_t Symbol::computeBinding() const {
+  if (visibility == STV_HIDDEN)
+   return binding;
   if ((visibility != STV_DEFAULT && visibility != STV_PROTECTED) ||
       versionId == VER_NDX_LOCAL)
     return STB_LOCAL;

This patch fixes the issue for me. The Symbol originally has STB_WEAK and STV_HIDDEN, computeBinding() then overrides that to STB_LOCAL.

Or alternative, in Linux:

diff --git a/include/linux/module.h b/include/linux/module.h
index 1e135fd5c..10aa3692c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -728,7 +728,7 @@ static inline bool within_module(unsigned long addr, const struct module *mod)
 }

 /* Get/put a kernel symbol (calls should be symmetric) */
-#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
+#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("protected"))); &(x); })
 #define symbol_put(x) do { } while (0)
 #define symbol_put_addr(x) do { } while (0)

I don't know the difference between hidden and protected (yet).

MaskRay commented 2 years ago

Sorry for the breakage related to hidden undefined weak symbols. I think this is a regression similar to another for ppc64 https://reviews.llvm.org/D119787 . There wasn't sufficient hidden undefined weak symbols tests and without interesting usage (like this one from Linux kernel) I'd be difficult to catch such problems beforehand.

There was a minor optimization related to reducing the number of computeBinding calls. Looks like it failed to account for some isUndefinedWeak usage which should be changed to isUndefined first.

I think the eventual fix should look like:

diff --git i/lld/ELF/Arch/AArch64.cpp w/lld/ELF/Arch/AArch64.cpp
index aa8550f70400..aca5300a33a3 100644
--- i/lld/ELF/Arch/AArch64.cpp
+++ w/lld/ELF/Arch/AArch64.cpp
@@ -257,7 +257,7 @@ bool AArch64::needsThunk(RelExpr expr, RelType type, const InputFile *file,
                          int64_t a) const {
   // If s is an undefined weak symbol and does not have a PLT entry then it
   // will be resolved as a branch to the next instruction.
-  if (s.isUndefWeak() && !s.isInPlt())
+  if (s.isUndefined() && !s.isInPlt())
     return false;
   // ELF for the ARM 64-bit architecture, section Call and Jump relocations
   // only permits range extension thunks for R_AARCH64_CALL26 and
diff --git i/lld/ELF/InputSection.cpp w/lld/ELF/InputSection.cpp
index 9fe9cea0426a..455e3483f696 100644
--- i/lld/ELF/InputSection.cpp
+++ w/lld/ELF/InputSection.cpp
@@ -729,7 +729,7 @@ uint64_t InputSectionBase::getRelocTargetVA(const InputFile *file, RelType type,
     if (expr == R_ARM_PCA)
       // Some PC relative ARM (Thumb) relocations align down the place.
       p = p & 0xfffffffc;
-    if (sym.isUndefWeak()) {
+    if (sym.isUndefined()) {
       // On ARM and AArch64 a branch to an undefined weak resolves to the next
       // instruction, otherwise the place. On RISCV, resolve an undefined weak
       // to the same instruction to cause an infinite loop (making the user

But I'd need to prepare some tests and verify that in a large code base.

MaskRay commented 2 years ago

Side note: I'd suggest avoid weak references if you can: https://maskray.me/blog/2021-04-25-weak-symbol#weak-references

Many cases can be emulated by weak definitions. Weak references require very careful cooperation among compiler/linker/human assembly writer ;-)

jgouly commented 2 years ago

@MaskRay Thanks for looking into this.

I tried the patch from your comment, and it does fix the issue for me.

MaskRay commented 2 years ago

@jgouly Thanks for the quick verification! Created a release/14.x backport request as shown above :)