Crappy code generated for simple loop

Quuxplusone commented 15 years ago


Bugzilla Link	PR4357
Status	NEW
Importance	P normal
Reported by	Evan Cheng (evan.cheng@apple.com)
Reported on	2009-06-10 12:43:58 -0700
Last modified on	2014-10-11 00:04:10 -0700
Version	trunk
Hardware	PC All
CC	anton@korobeynikov.info, devang.patel@gmail.com, freik@fb.com, llvm-bugs@lists.llvm.org, llvm@sunfishcode.online, nicholas@mxc.ca, quickslyver@free.fr
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

int hcf(int a, int b)
{
        while (a != 0)
        {
                if (a < b) b -= a;
                else a -= b;
        }
        return b;
}

David Majnemer has a piece of code that shows less than optimal code generation
for llvm.

This is what icc generates:

8048624 <hcf>:
 8048624:       8b 54 24 04             mov    0x4(%esp),%edx
 8048628:       8b 44 24 08             mov    0x8(%esp),%eax
 804862c:       85 d2                   test   %edx,%edx
 804862e:       74 0c                   je     804863c <hcf+0x18>
 8048630:       3b d0                   cmp    %eax,%edx
 8048632:       7d 04                   jge    8048638 <hcf+0x14>
 8048634:       2b c2                   sub    %edx,%eax
 8048636:       eb f8                   jmp    8048630 <hcf+0xc>
 8048638:       2b d0                   sub    %eax,%edx
 804863a:       75 f4                   jne    8048630 <hcf+0xc>
 804863c:       c3                      ret

This is what llvm-gcc generates with -fomit-frame-pointer:

_hcf:
        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        jmp     LBB1_3
LBB1_1:
        cmpl    %eax, %ecx
        jl      LBB1_5
        subl    %eax, %ecx
LBB1_3:
        testl   %ecx, %ecx
        jne     LBB1_1
        ret
LBB1_5:
        subl    %ecx, %eax
        jmp     LBB1_3

clang does something even more horrible:

_hcf:
        pushl   %esi
        movl    12(%esp), %eax
        movl    8(%esp), %ecx
        jmp     LBB1_5
        .align  4,0x90
LBB1_1:
        xorl    %edx, %edx
        movl    %eax, %esi
        jmp     LBB1_3
        .align  4,0x90
LBB1_2:
        subl    %ecx, %esi
        addl    %ecx, %edx
LBB1_3:
        cmpl    %esi, %ecx
        jl      LBB1_2
        addl    %edx, %ecx
        subl    %eax, %ecx
        movl    %esi, %eax
LBB1_5:
        testl   %ecx, %ecx
        jne     LBB1_1
        popl    %esi
        ret

Quuxplusone commented 15 years ago

Another test case. This is even worse.

int hcf(int a, int b)
{
        if (a == 0) return b;
        else if (a < b) return hcf(a, b-a);
        else return hcf(a-b, b);
}

Quuxplusone commented 15 years ago

I don't understand why you use the term "even more horrible" meaning that llvm-
gcc code is more horrible than icc code.
I don't see any difference in complexity between icc code and llvm-gcc code

Quuxplusone commented 15 years ago

it seems that the "-simplifycfg" pass near the end of the optimization passes
generate the bad code with clang.

Without this passe clang output the following code that is very close to icc
code: both have 2 mov, 4 jmp/jc , 1 tst, 1 cmp, 2 sub, 1 ret

hcf:
.LBB1_0:    # entry
    movl    8(%esp), %eax
    movl    4(%esp), %ecx
    .align  16
.LBB1_1:    # while.cond.outer
    testl   %ecx, %ecx
    je  .LBB1_5 # while.end.split
    jmp .LBB1_3 # while.cond
    .align  16
.LBB1_2:    # if.then
    subl    %ecx, %eax
.LBB1_3:    # while.cond
    cmpl    %eax, %ecx
    jl  .LBB1_2 # if.then
.LBB1_4:    # if.else.split
    subl    %eax, %ecx
    jmp .LBB1_1 # while.cond.outer
.LBB1_5:    # while.end.split
    ret

Quuxplusone commented 14 years ago

Is this one any better lately?

Quuxplusone commented 14 years ago

no, there is still a problem with clang

Quuxplusone commented 9 years ago

I've got a fix for this (it's not SimplifyCFG, it's an overly simplistic heuristic in GVN), but I need to see if the solution is worse than the problem in other perf situations. The inner loop top alignment is probably also a poor decision, but that's a different issue entirely.

Also: the code coming out of ARMv7 codegen without the fix is even worse than the x86 output.

Quuxplusone / LLVMBugzillaTest

Crappy code generated for simple loop #4863