Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[x86-64] optimize va_start when there are no XMM vaarg calls #7250

Open Quuxplusone opened 14 years ago

Quuxplusone commented 14 years ago
Bugzilla Link PR6796
Status NEW
Importance P normal
Reported by Ed Schouten (ed@80386.nl)
Reported on 2010-04-06 11:51:59 -0700
Last modified on 2010-09-01 14:21:09 -0700
Version trunk
Hardware Macintosh FreeBSD
CC clattner@nondot.org, hinokind@gmail.com, llvm-bugs@lists.llvm.org, llvm@sunfishcode.online
Fixed by commit(s)
Attachments
Blocks
Blocked by PR1740
See also
The following code generates very different code when using GCC 4.2.1/Clang
SVN. The generated code also makes little sense.

int
foo(int a, ...)
{
    int r;
    __builtin_va_list va;

    __builtin_va_start(va, a);
    r = __builtin_va_arg(va, int);
    __builtin_va_end(va);
    return (r);
}

GCC:

0000000000000000 <foo>:
   0:   48 83 ec 60             sub    $0x60,%rsp
   4:   48 8d 44 24 68          lea    0x68(%rsp),%rax
   9:   48 89 74 24 b0          mov    %rsi,0xffffffffffffffb0(%rsp)
   e:   c7 44 24 88 10 00 00    movl   $0x10,0xffffffffffffff88(%rsp)
  15:   00
  16:   48 89 44 24 90          mov    %rax,0xffffffffffffff90(%rsp)
  1b:   48 8d 44 24 a8          lea    0xffffffffffffffa8(%rsp),%rax
  20:   48 89 44 24 98          mov    %rax,0xffffffffffffff98(%rsp)
  25:   48 83 c0 08             add    $0x8,%rax
  29:   8b 00                   mov    (%rax),%eax
  2b:   48 83 c4 60             add    $0x60,%rsp
  2f:   c3                      retq

Clang:

0000000000000000 <foo>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 50             sub    $0x50,%rsp
   8:   84 c0                   test   %al,%al
   a:   74 26                   je     32 <foo+0x32>
   c:   0f 29 85 60 ff ff ff    movaps %xmm0,0xffffffffffffff60(%rbp)
  13:   0f 29 8d 70 ff ff ff    movaps %xmm1,0xffffffffffffff70(%rbp)
  1a:   0f 29 55 80             movaps %xmm2,0xffffffffffffff80(%rbp)
  1e:   0f 29 5d 90             movaps %xmm3,0xffffffffffffff90(%rbp)
  22:   0f 29 65 a0             movaps %xmm4,0xffffffffffffffa0(%rbp)
  26:   0f 29 6d b0             movaps %xmm5,0xffffffffffffffb0(%rbp)
  2a:   0f 29 75 c0             movaps %xmm6,0xffffffffffffffc0(%rbp)
  2e:   0f 29 7d d0             movaps %xmm7,0xffffffffffffffd0(%rbp)
  32:   4c 89 8d 58 ff ff ff    mov    %r9,0xffffffffffffff58(%rbp)
  39:   4c 89 85 50 ff ff ff    mov    %r8,0xffffffffffffff50(%rbp)
  40:   48 89 8d 48 ff ff ff    mov    %rcx,0xffffffffffffff48(%rbp)
  47:   48 89 95 40 ff ff ff    mov    %rdx,0xffffffffffffff40(%rbp)
  4e:   48 89 b5 38 ff ff ff    mov    %rsi,0xffffffffffffff38(%rbp)
  55:   48 8d 85 30 ff ff ff    lea    0xffffffffffffff30(%rbp),%rax
  5c:   48 89 45 f8             mov    %rax,0xfffffffffffffff8(%rbp)
  60:   48 8d 45 10             lea    0x10(%rbp),%rax
  64:   48 89 45 f0             mov    %rax,0xfffffffffffffff0(%rbp)
  68:   c7 45 ec 30 00 00 00    movl   $0x30,0xffffffffffffffec(%rbp)
  6f:   c7 45 e8 08 00 00 00    movl   $0x8,0xffffffffffffffe8(%rbp)
  76:   48 63 45 e8             movslq 0xffffffffffffffe8(%rbp),%rax
  7a:   48 83 f8 28             cmp    $0x28,%rax
  7e:   77 0f                   ja     8f <foo+0x8f>
  80:   48 89 c1                mov    %rax,%rcx
  83:   48 03 4d f8             add    0xfffffffffffffff8(%rbp),%rcx
  87:   83 c0 08                add    $0x8,%eax
  8a:   89 45 e8                mov    %eax,0xffffffffffffffe8(%rbp)
  8d:   eb 0c                   jmp    9b <foo+0x9b>
  8f:   48 8b 4d f0             mov    0xfffffffffffffff0(%rbp),%rcx
  93:   48 8d 41 08             lea    0x8(%rcx),%rax
  97:   48 89 45 f0             mov    %rax,0xfffffffffffffff0(%rbp)
  9b:   8b 01                   mov    (%rcx),%eax
  9d:   48 83 c4 50             add    $0x50,%rsp
  a1:   5d                      pop    %rbp
  a2:   c3                      retq
Quuxplusone commented 14 years ago

The code sequence is the generic code sequence that is needed if you do a vector of fp va_arg. GCC has an optimization pass that scans a function to see if the va_list provably doesn't escape and if there are no fp accesses.

This impacts stuff like the implementation of the open syscall.

Quuxplusone commented 14 years ago

This is aka rdar://7832354

Quuxplusone commented 14 years ago

For this to happen, we first need to get the front-ends using the va_arg instruction. For that to happen, CodeGen needs to fully support the va_arg instruction on all important targets. For that to happen, we need target-independent support for va_arg with aggregate types, and target-dependent support for lowering va_arg for all the important targets.

Patches welcome.