Too much stack manipulation with -tailcallopt on x86

Quuxplusone commented 14 years ago


Bugzilla Link	PR6774
Status	NEW
Importance	P enhancement
Reported by	Max Bolingbroke (batterseapower@hotmail.com)
Reported on	2010-04-04 06:50:47 -0700
Last modified on	2012-09-25 09:11:57 -0700
Version	trunk
Hardware	PC All
CC	anton@korobeynikov.info, arnold.schwaighofer@gmail.com, batterseapower@hotmail.com, davidterei@gmail.com, dsaritz@gmail.com, evan.cheng@apple.com, foldr@codedgers.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments	`get-align.patch` (1111 bytes, text/plain)
Blocks
Blocked by
See also

The GHC LLVM backend generates code with lots of tail calls, and we use -
tailcallopt to ensure that these are turned into actual "jmp" instructions.

Unfortunately, the -tailcallopt option seems to lead to excessive manipulation
of the stack pointer. Here is some an example IR fragment generated from some
relatively simple Haskell code:

{{{
define cc10 void @Toy_toy_entry(i32 %stg_terei_baseArg, i32
%stg_terei_spArg, i32 %stg_terei_hpArg, i32 %stg_terei_r1Arg) nounwind
{
cdx:
  %ndz = add i32 %stg_terei_spArg, -4             ; <i32> [#uses=3]
  %ndB = add i32 %stg_terei_baseArg, 84           ; <i32> [#uses=1]
  %ndC = inttoptr i32 %ndB to i32*                ; <i32*> [#uses=1]
  %ndD = load i32* %ndC                           ; <i32> [#uses=1]
  %ndE = icmp ult i32 %ndz, %ndD                  ; <i1> [#uses=1]
  br i1 %ndE, label %cdG, label %ndH

ndH:                                              ; preds = %cdx
  %ndJ = inttoptr i32 %stg_terei_spArg to i32*    ; <i32*> [#uses=2]
  %ndK = load i32* %ndJ                           ; <i32> [#uses=1]
  %ndP = inttoptr i32 %ndz to i32*                ; <i32*> [#uses=1]
  store i32 add (i32 ptrtoint (%Toy_toy1_closure_struct*
@Toy_toy2_closure to i32), i32 1), i32* %ndP
  store i32 ptrtoint (%sbo_info_struct* @sbo_info to i32), i32* %ndJ
  tail call cc10 void @stg_ap_p_fast(i32 %stg_terei_baseArg, i32 %ndz,
i32 %stg_terei_hpArg, i32 %ndK) nounwind
  ret void

cdG:                                              ; preds = %cdx
  %ne1 = add i32 %stg_terei_baseArg, -4           ; <i32> [#uses=1]
  %ne2 = inttoptr i32 %ne1 to i32*                ; <i32*> [#uses=1]
  %ne3 = load i32* %ne2                           ; <i32> [#uses=1]
  %ne4 = inttoptr i32 %ne3 to void (i32, i32, i32, i32)* ; <void (i32,
i32, i32, i32)*> [#uses=1]
  tail call cc10 void %ne4(i32 %stg_terei_baseArg, i32
%stg_terei_spArg, i32 %stg_terei_hpArg, i32 ptrtoint
(%Toy_toy_closure_struct* @Toy_toy_closure to i32)) nounwind
  ret void
}
}}}

With -tailcallopt, llc generates the following x86:

{{{
_Toy_toy_entry:                         ## @Toy_toy_entry
## BB#0:                                ## %cdx
    subl    $12, %esp
    leal    -4(%ebp), %eax
    cmpl    84(%ebx), %eax
    jb  LBB2_2
## BB#1:                                ## %ndH
    movl    (%ebp), %esi
    movl    $_Toy_toy2_closure+1, -4(%ebp)
    movl    $_sbo_info, (%ebp)
    movl    %eax, %ebp
    addl    $12, %esp
    jmp _stg_ap_p_fast  # TAILCALL
LBB2_2:                                 ## %cdG
    movl    -4(%ebx), %eax
    movl    $_Toy_toy_closure, %esi
    addl    $12, %esp
    jmpl    *%eax  # TAILCALL

    .globl  ___stginit_Toy_
    .align  4, 0x90
}}}

Note the subl of %esp at the end point and corresponding addl to %esp at both
exit points. %esp is not used in the body of the function, so this is pointless.

Compiling without -tailcallopt removes this extra stack manipulation.

As far as I can see, the two tail calls meet the requirements for "sibling
calls" - and indeed llc without -tailcallopt still turns the calls into "jmp"s.
Is it possible for LLVM to try this approach first, and only fall back on the
full tail-call optimisation if it can't apply the simpler sibling-call
optimisation?

(See also my post to the GHC mailing list at
http://article.gmane.org/gmane.comp.lang.haskell.cvs.ghc/38323)

Quuxplusone commented 14 years ago

Attached get-align.patch (1111 bytes, text/plain): Tail call stack alignment fix (may or may not fix the problem)

Quuxplusone commented 14 years ago

(In reply to comment #1)
> Created an attachment (id=4636) [details]
> Tail call stack alignment fix (may or may not fix the problem)
>
> Possible fix. This patch fixes a problem/inefficiency with the stack alignment
> code. Probably don't get around to test it till the weekend or even later.
> (Patch is against an older revision). Since I don't have time now to
thoroughly
> look at the problem, it might or might not solve the problem. Sorry.

Thanks for the quick reply, Arnold. Unfortunately I don't see any change in
behaviour with your patch :-(

Quuxplusone / LLVMBugzillaTest

Too much stack manipulation with -tailcallopt on x86 #7228