Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Missed optimization in CMPXCHG-loop #16428

Open Quuxplusone opened 11 years ago

Quuxplusone commented 11 years ago
Bugzilla Link PR16429
Status NEW
Importance P enhancement
Reported by Clemens Hammacher (hammacher@cs.uni-saarland.de)
Reported on 2013-06-24 12:22:00 -0700
Last modified on 2013-06-24 12:22:00 -0700
Version trunk
Hardware PC All
CC llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments test.ll (691 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 10749
bitcode using atomicrmw and cmpxchg-loop

An atomicrmw with an operation different from "add" and "sub" gets translated
to a cmpxchg-loop on x86, since there is no single hardware instruction for
doing that.
If you try to write that loop manually however, the generated assembly is
longer and uses one more register.
The backend seems to miss the fact that cmpxchg does set the ZF flag, and hence
emits an unnecessary cmp.

I attached a bitcode file containing two methods. "bar" uses an atomicrmw
instruction, "baz" a cmpxchg-loop. "bar" results in optimal assembly, "baz"
contains an additional cmp plus several movs, and uses one more register.
I generate assembly using "llc -o - test.ll".

This is the non-optimal assembly part:

        [...]
        movl    L_foo(%rip), %eax
LBB1_1:                                 ## %loop
 (1)    movl    %eax, %ecx
 (2)    movl    %ecx, %edx
        andl    %edi, %edx
        lock
        cmpxchgl        %edx, L_foo(%rip)
 (3)    cmpl    %eax, %ecx
        jne     LBB1_1
        [...]

Lines (2) and (3) can be skipped completely if (1) copies to %edx directly.
This also saves register %ecx. This would also match the code generated for the
atomicrmw instruction.
Quuxplusone commented 11 years ago

Attached test.ll (691 bytes, application/octet-stream): bitcode using atomicrmw and cmpxchg-loop