Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

missing optimization (alias analysis issue?) #29612

Open Quuxplusone opened 8 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR30638
Status NEW
Importance P normal
Reported by Ivan Sorokin (vanyacpp@gmail.com)
Reported on 2016-10-07 12:39:15 -0700
Last modified on 2018-07-11 11:36:12 -0700
Version trunk
Hardware PC Linux
CC ditaliano@apple.com, ehsanamiri@gmail.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, mkuper@google.com, sebpop@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
The following code:

#include <vector>
#include <memory>

struct base
{
    virtual ~base()
    {}
};

void f(std::vector<std::unique_ptr<base> >& v)
{
    v.back().release();
    v.pop_back();
}

GCC 6.2 is able to optimize to just two instructions:

        sub     QWORD PTR [rdi+8], 8
        ret

The code generated by clang 3.9.0 is less efficient:

        push    rbx
        mov     rax, qword ptr [rdi + 8]

        mov     qword ptr [rax - 8], 0

        mov     rax, qword ptr [rdi + 8]
        lea     rbx, [rax - 8]
        mov     qword ptr [rdi + 8], rbx

        mov     rdi, qword ptr [rax - 8]
        test    rdi, rdi
        je      .LBB0_2
        mov     rax, qword ptr [rdi]
        call    qword ptr [rax + 8]
.LBB0_2:
        mov     qword ptr [rbx], 0
        pop     rbx
        ret

As [rax - 8] is reloaded after subtraction I tend believe it is an alias
analysis issue.
Quuxplusone commented 6 years ago
Confirmed: missed optimization.

Today's llvm for aarch64 produces 7 loads

    str x19, [sp, #-32]!        // 8-byte Folded Spill
    stp x29, x30, [sp, #16]     // 8-byte Folded Spill
    ldr x8, [x0, #8]
    stur    xzr, [x8, #-8]
    ldr x8, [x0, #8]
    sub x19, x8, #8             // =8
    str x19, [x0, #8]
    ldur    x0, [x8, #-8]
    add x29, sp, #16            // =16
    cbz x0, .LBB0_2
// %bb.1:
    ldr x8, [x0]
    ldr x8, [x8, #8]
    blr x8
.LBB0_2:
    str xzr, [x19]
    ldp x29, x30, [sp, #16]     // 8-byte Folded Reload
    ldr x19, [sp], #32          // 8-byte Folded Reload
    ret

versus only one load when compiling with gcc trunk:

    ldr x1, [x0, 8]
    sub x1, x1, #8
    str x1, [x0, 8]
    ret