Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

RuntimeDyld (Mach-O, x86_64) generates wrong relocation resolutions for FP constants #15312

Open Quuxplusone opened 11 years ago

Quuxplusone commented 11 years ago
Bugzilla Link PR15312
Status NEW
Importance P normal
Reported by Kimon Hoffmann (Kimon.Hoffmann@gmx.net)
Reported on 2013-02-20 07:18:11 -0800
Last modified on 2013-10-16 09:56:47 -0700
Version 3.2
Hardware Macintosh MacOS X
CC baldrick@free.fr, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments file_15312.txt (542 bytes, text/plain)
file_15312.txt (4237 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 10045
C sourcecode of sample (BugDemo.c)

When using MC-JIT, either from within my own test program or through lli, to
JIT-compile an run the process function in the attached sample produces an
invalid result.

The problem does *not* occur when:
- Statically compiling the sample
- JIT-compiling and running the sample using lli using the "old" JIT

I have been able to reproduce the problem with LLVM/Clang 3.2 as well as with a
rather recent version from trunk. The (incorrect-)output of the test program
changes when an optimization level greater than 0 is passed to lli.

Steps to reproduce:

clang -O0 -emit-llvm -c BugDemo.c -o BugDemo.bc
# To produce the correct output (-O0 only provided for symmetry with the MC
invocation, level does not matter for the generated output)
lli -O0 BugDemo.bc
# To produce the incorrect output
lli -O0 -use-mcjit BugDemo.bc
# To produce different incorrect output
lli -O1 -use-mcjit BugDemo.bc

System info:

LLVM (http://llvm.org/):
  LLVM version 3.2svn
  Optimized build with assertions.
  Built Feb 20 2013 (13:24:30).
  Default target: x86_64-apple-darwin12.2.0
  Host CPU: corei7-avx

LLVM (http://llvm.org/):
  LLVM version 3.3svn
  Optimized build with assertions.
  Built Feb  9 2013 (15:12:53).
  Default target: x86_64-apple-darwin12.2.0
  Host CPU: corei7-avx

I tried reducing the problematic code as much as possible, of which the
attached sample is the result. I have tried to utilize bugpoint to futher
reduce the sample, without any luck.

Please let me know if you need further information, I'll be glad to assist!
Quuxplusone commented 11 years ago

Attached file_15312.txt (542 bytes, text/plain): C sourcecode of sample (BugDemo.c)

Quuxplusone commented 11 years ago

Attached file_15312.txt (4237 bytes, text/plain): LLVM assembly of sample (BugDemo.ll)

Quuxplusone commented 11 years ago
I have further investigated the problem using lldb on the JIT-generated code,
and it turns out that the generated code loads the constant factor 0.25 twice
instead of loading 0.75 as the factor for "state" and 0.25 as the factor for
"control".

Here is the MC-JIT generated assembly:

(lldb) disassemble -s 0x0000000100e82020 -e 0x0000000100e8206d
   0x100e82020:  vmovss (%rdx), %xmm0
   0x100e82024:  testl  %ecx, %ecx
   0x100e82026:  je     0x100e82068
   0x100e82028:  vmovss (%rsi), %xmm1
   0x100e8202c:  movabsq $4310179952, %rax
   0x100e82036:  vmulss (%rax), %xmm1, %xmm1
   0x100e8203a:  movabsq $4310179952, %rax
   0x100e82044:  vmovss (%rax), %xmm2
   0x100e82048:  nopl   (%rax,%rax)
   0x100e82050:  vmulss %xmm2, %xmm0, %xmm3
   0x100e82054:  vmulss (%rdi), %xmm0, %xmm0
   0x100e82058:  vmovss %xmm0, (%rdi)
   0x100e8205c:  vaddss %xmm3, %xmm1, %xmm0
   0x100e82060:  addq   $4, %rdi
   0x100e82064:  decl   %ecx
   0x100e82066:  jne    0x100e82050
   0x100e82068:  vmovss %xmm0, (%rdx)
   0x100e8206c:  ret

(lldb) p *(float*)4310179952
(float) $3 = 0.25
Quuxplusone commented 11 years ago
Please ignore my last comment about the generated assembly.
While it is indeed related to the problem reported in this bug, it was obtained
from a different version of the sample program, whose exact misbehavior does
not match the one of the sample attached to this bug.

In the meantime I have read through all IR dumps produced by "-print-after-all"
in -O0 and -O1 mode and the IR stages printed for the "process" function
appears to be correct as far as I can tell.
Quuxplusone commented 11 years ago
In the meantime I have narrowed the problem down to the runtime dynamic loader.
For this purpose I've written a test program based on the llvm-rtdyld tool that
loads the same code from a Mach-O object file, obtains a pointer to the
process() function, executes it and prints the result.

From disassembly it appears to me that the loader gets the PC-relative
addressing of the two constants wrong.

Constant 0.25 is at 0x1002fa0f0
Constant 0.75 is at 0x1002fa0f4

0x1002fa090:  movss  88(%rip), %xmm0  // Loads 0.25 @ 0x1002fa0f0
0x1002fa098:  movss  80(%rip), %xmm1  // Loads 0.25 @ 0x1002fa0f0

This fits the wrong, absolute addressing I previously posted about, albeit this
was obtained from a different, but structurally similar sample.
Quuxplusone commented 11 years ago
Now that I have seen that resolveRelocation() has a debug print, I reran the
test via: lli -use-mcjit -debug BugDemo.bc

From the output it appears that the dynamic loader resolves all three floating
point constants there are in the program (1.0, 0.75 and 0.25) to the base
address of the section that contains all these constants (Name: __literal4,
Size: 12 Bytes). This matches the program output actually printed to stdout.

Here are the relevant debug output messages:

[...]

emitSection SectionID: 3 Name: __literal4 obj addr: 0x7fd49101cde8 new addr:
0x103e4a160 DataSize: 12 StubBufSize: 0 Allocate: 12
        Addend: 140550942608520 Offset: 0xaa Type: 100663299

[...]

Resolving relocations Section #3    0x103e4a160
    SectionID: 2 + 180 (0x103e4a0b4) RelType: 100663298 Addend: 0
resolveRelocation LocalAddress: 0x103e4a0b4 FinalAddress: 0x103e4a0b4 Value:
0x103e4a160 Addend: 0 isPCRel: 0 MachoType: 0 Size: 8
    SectionID: 2 + 45 (0x103e4a02d) RelType: 100663298 Addend: 0
resolveRelocation LocalAddress: 0x103e4a02d FinalAddress: 0x103e4a02d Value:
0x103e4a160 Addend: 0 isPCRel: 0 MachoType: 0 Size: 8
    SectionID: 2 + 31 (0x103e4a01f) RelType: 100663298 Addend: 0
resolveRelocation LocalAddress: 0x103e4a01f FinalAddress: 0x103e4a01f Value:
0x103e4a160 Addend: 0 isPCRel: 0 MachoType: 0 Size: 8
Quuxplusone commented 11 years ago
As by request here are the correct and incorrect outputs produced by the
attached sample:

Correct output as produced by `lli -O0 BugDemo.bc`:
00: 0.000000
01: 0.250000
02: 0.437500
03: 0.578125
04: 0.683594
05: 0.762695
06: 0.822021
07: 0.866516

Incorrect output as produced by `lli -O0 -use-mcjit BugDemo.bc`:
00: 0.000000
01: 0.562500
02: 0.984375
03: 1.300781
04: 1.538086
05: 1.716064
06: 1.849548
07: 1.949661

Incorrect output as produced by `lli -O1 -use-mcjit BugDemo.bc`:
00: 0.000000
01: 0.062500
02: 0.078125
03: 0.082031
04: 0.083008
05: 0.083252
06: 0.083313
07: 0.083328
Quuxplusone commented 11 years ago

Does it still happen if you add -mcpu=i386 to the command line?

Quuxplusone commented 11 years ago

I get the correct output using mcjit on x86-64 linux, both at -O0 and -O1.

Quuxplusone commented 11 years ago

Using -mcpu=i386 doesn't change the output of either of the two invocations.

Quuxplusone commented 11 years ago

To make reasonably sure that the observed behavior is not due to a miscompile of LLVM/Clang on my part, I reran the steps with the LLVM/Clang 3.2 from OSX Homebrew and get the exact same behavior.

Quuxplusone commented 11 years ago
Further investigation on one of my Linux boxes revealed an easy way to
reproduce the erroneous behavior there too:

1. Compile: clang -emit-llvm -c -S BugDemo.c -o BugDemo.ll
2. Change the target-triple in BugDemo.ll from (in my case) "x86_64-pc-linux-
gnu" to "x86_64-apple-macosx10.8.0"
3. Run: lli -use-mcjit ./BugDemo.ll

On the other hand, with a target triple of "x86_64-pc-linux-gnu" the attached
sample also works flawlessly on OSX.

IMHO this supports my hypothesis that the bug is located on the Mach-O runtime
dynamic loader.
Quuxplusone commented 11 years ago
This appears to have been fixed. I have retested this with issue with:

1. clang version 3.3 (tags/RELEASE_33/final)
   Target: x86_64-apple-darwin12.5.0
   Thread model: posix

   LLVM version 3.3
   Optimized build with assertions.
   Built Jun 27 2013 (16:25:16).
   Default target: x86_64-apple-darwin12.5.0
   Host CPU: corei7-avx

2. Ubuntu clang version 3.4-1~exp1 (trunk) (based on LLVM 3.4)
   Target: x86_64-pc-linux-gnu
   Thread model: posix

   LLVM version 3.4
   Optimized build.
   Built Oct 15 2013 (22:31:57).
   Default target: x86_64-pc-linux-gnu
   Host CPU: corei7-avx

   Obtained from the official snapshot APT repositories (SVN revision 19684)

I will verify this with my source built version of llvm/clang once I have
updated it to the official 3.3 release.
Quuxplusone commented 11 years ago

I can also confirm that the error has been resolved when using my custom llvm/clang 3.3 toolchain.

Thanks a lot! Should I mark this bug as RESOLVED/FIXED?