Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

lld overwrites first byte of jump table #34792

Closed Quuxplusone closed 3 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR35819
Status RESOLVED WONTFIX
Importance P enhancement
Reported by Thomas Schaub (t.schaub@gmx.de)
Reported on 2018-01-04 03:32:50 -0800
Last modified on 2021-02-19 16:03:38 -0800
Version unspecified
Hardware Macintosh MacOS X
CC jezreel@gmail.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
lld seems to overwrite the first byte of the jump table. Consider the following
program

  inline int some_number()
  {
    return 0;
  }

  int main()
  {
    switch (some_number())
    {
      case 0:
        break;
      case 1:
        break;
      case 2:
        break;
      case 3:
        break;
    }
  }

Compiling this results in

 % clang++ -v
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir:
/Applications/Xcode8.3.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 % clang++ -c -o main.cpp.o main.cpp
 % otool -tV main.cpp.o
build/main.cpp.o:
(__TEXT,__text) section
_main:
0000000000000000  pushq %rbp
...
0000000000000059  nopl  _main(%rax)
000000000000005c  .long 4294967264  @ KIND_JUMP_TABLE32
0000000000000060  .long 4294967269  @ KIND_JUMP_TABLE32
0000000000000064  .long 4294967274  @ KIND_JUMP_TABLE32
0000000000000068  .long 4294967279  @ KIND_JUMP_TABLE32
000000000000006c  nopl  (%rax)

When linking this program with ld I get

 % ld -v
@(#)PROGRAM:ld  PROJECT:ld64-278.4
configured to support archs: armv6 armv7 armv7s arm64 i386 x86_64 x86_64h
armv6m armv7k armv7m armv7em (tvOS)
LTO support using: LLVM version 8.1.0, (clang-802.0.42)
TAPI support using: Apple TAPI version 1.33.11
 % ld -arch x86_64 -o main-ld main.cpp.o -lSystem
 % otool -t main-ld
main-ld:
Contents of (__TEXT,__text) section
0000000100000f20  55 48 89 e5 48 83 ec 20 c7 45 fc 00 00 00 00 e8
0000000100000f30  64 00 00 00 89 c1 89 ca 83 e8 03 48 89 55 f0 89
0000000100000f40  45 ec 0f 87 28 00 00 00 48 8d 05 2d 00 00 00 48
0000000100000f50  8b 4d f0 48 63 14 88 48 01 c2 ff e2 e9 0f 00 00
0000000100000f60  00 e9 0a 00 00 00 e9 05 00 00 00 e9 00 00 00 00
0000000100000f70  8b 45 fc 48 83 c4 20 5d c3 0f 1f 00 e0 ff ff ff
0000000100000f80  e5 ff ff ff ea ff ff ff ef ff ff ff 0f 1f 40 00
0000000100000f90  55 48 89 e5 31 c0 5d c3

The first jump table entry is at 0x100000f20 + 0x5c = 0x100000F7C and has the
value ffffffe0 (or 4294967264 in decimal) which is the value from the object
file.

Now, doing the same with lld I get

 % ~/foreign/llvm-master-install/bin/ld64.lld -arch x86_64 -o main-lld main.cpp.o -lSystem
 % otool -t main-lld
main-lld:
Contents of (__TEXT,__text) section
0000000100000f30  55 48 89 e5 48 83 ec 20 c7 45 fc 00 00 00 00 e8
0000000100000f40  5c 00 00 00 89 c1 89 ca 83 e8 03 48 89 55 f0 89
0000000100000f50  45 ec 0f 87 28 00 00 00 48 8d 05 2d 00 00 00 48
0000000100000f60  8b 4d f0 48 63 14 88 48 01 c2 ff e2 e9 0f 00 00
0000000100000f70  00 e9 0a 00 00 00 e9 05 00 00 00 e9 00 00 00 00
0000000100000f80  8b 45 fc 48 83 c4 20 5d c3 0f 1f 00 a4 ff ff ff
0000000100000f90  e5 ff ff ff ea ff ff ff ef ff ff ff 90 ff ff ff
0000000100000fa0  55 48 89 e5 31 c0 5d c3

The first jump table entry is at 0x100000f30 + 0x5c = 0x100000F8C and has the
value ffffffa4 (or 4294967204 in decimal) which is NOT the value from the
object file. In particular, the first byte is a4 instead of e0. I'm not sure if
that's a coincidence, but this jumps to the start of the function (and not to
the appropriate case handler).

Note that lld was built from LLVM commit
f494e856dbbecfdc2958a07cd4acc3c6a7ed7533 and lld commit
874cf0193393de7ad4b480d8519a6e40375cf938. Also I only tried this for MachO.
Quuxplusone commented 6 years ago

I just realized that the lld debug build runs into llvm_unreachable in lld::mach_o::ArchHandler_x86_64::applyFixupFinal.

Quuxplusone commented 6 years ago
Next insight: the reference passed to applyFixupFinal is created in
normalizedObjectToAtoms and its kindValue is set to handler-
>dataInCodeTransitionStart(*atom), which is not overridden in
ArchHandler_x86_64. In ArchHandler_x86, there's a modeData kind. Naively
copying this yields this patch:

 % svn diff                                                                                                                                                                                                                                                    11:06:52
Index: lib/ReaderWriter/MachO/ArchHandler_x86_64.cpp
===================================================================
--- lib/ReaderWriter/MachO/ArchHandler_x86_64.cpp   (revision 321863)
+++ lib/ReaderWriter/MachO/ArchHandler_x86_64.cpp   (working copy)
@@ -181,6 +181,20 @@
                                 FindAddressForAtom addressForAtom,
                                 normalized::Relocations &relocs) override;

+  bool isDataInCodeTransition(Reference::KindValue refKind) override {
+    return refKind == modeCode || refKind == modeData;
+  }
+
+  Reference::KindValue dataInCodeTransitionStart(
+                                        const MachODefinedAtom &atom) override
{
+    return modeData;
+  }
+
+  Reference::KindValue dataInCodeTransitionEnd(
+                                        const MachODefinedAtom &atom) override
{
+    return modeCode;
+  }
+
 private:
   static const Registry::KindStrings _sKindStrings[];
   static const StubInfo              _sStubInfo;
@@ -188,6 +202,9 @@
   enum X86_64Kind: Reference::KindValue {
     invalid,               /// for error condition

+    modeCode,              /// Content starting at this offset is code.
+    modeData,              /// Content starting at this offset is data.
+
     // Kinds found in mach-o .o files:
     branch32,              /// ex: call _foo
     ripRel32,              /// ex: movq _foo(%rip), %rax
@@ -242,7 +259,10 @@
 };

 const Registry::KindStrings ArchHandler_x86_64::_sKindStrings[] = {
-  LLD_KIND_STRING_ENTRY(invalid), LLD_KIND_STRING_ENTRY(branch32),
+  LLD_KIND_STRING_ENTRY(invalid),
+  LLD_KIND_STRING_ENTRY(modeCode),
+  LLD_KIND_STRING_ENTRY(modeData),
+  LLD_KIND_STRING_ENTRY(branch32),
   LLD_KIND_STRING_ENTRY(ripRel32), LLD_KIND_STRING_ENTRY(ripRel32Minus1),
   LLD_KIND_STRING_ENTRY(ripRel32Minus2), LLD_KIND_STRING_ENTRY(ripRel32Minus4),
   LLD_KIND_STRING_ENTRY(ripRel32Anon),
@@ -601,6 +621,8 @@
   case negDelta32:
     *loc32 = fixupAddress - targetAddress + ref.addend();
     return;
+  case modeCode:
+  case modeData:
   case lazyPointer:
     // Do nothing
     return;
@@ -711,6 +733,8 @@
   case ripRel32GotLoadNowLea:
     llvm_unreachable("ripRel32GotLoadNowLea implies GOT pass was run");
     return;
+  case modeCode:
+  case modeData:
   case lazyPointer:
   case lazyImmediateLocation:
     llvm_unreachable("lazy reference kind implies Stubs pass was run");
@@ -743,6 +767,9 @@
   assert(ref.kindArch() == Reference::KindArch::x86_64);
   uint32_t sectionOffset = atomSectionOffset + ref.offsetInAtom();
   switch (static_cast<X86_64Kind>(ref.kindValue())) {
+  case modeCode:
+  case modeData:
+    break;
   case branch32:
     appendReloc(relocs, sectionOffset, symbolIndexForAtom(*ref.target()), 0,
                 X86_64_RELOC_BRANCH | rPcRel | rExtern | rLength4);
Quuxplusone commented 3 years ago

Closing this as it applies to the old iteration of LLD-MachO which is not being developed. (The new one is named ld64.lld.darwinnew, and will be receiving active feature development and bug fixes.)