Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

ARM Cortex-A9 optimization improvement (maybe) #5056

Open Quuxplusone opened 14 years ago

Quuxplusone commented 14 years ago
Bugzilla Link PR7457
Status NEW
Importance P enhancement
Reported by Jonathan Engdahl (jrengdahl@gmail.com)
Reported on 2010-06-22 22:34:08 -0700
Last modified on 2016-01-14 05:28:03 -0800
Version trunk
Hardware PC Windows XP
CC anton@korobeynikov.info, efriedma@quicinc.com, llvm-bugs@lists.llvm.org, rengolin@gmail.com
Fixed by commit(s)
Attachments ubfx.zip (3381 bytes, application/zip)
Blocks
Blocked by
See also
Created attachment 5093
minimal C++ program that demonstrates the case

This is a minuscule improvement, but since I spent some effort looking at it,
here it is:

ARM Cortex-A9 Thumb2

        lsrs    r2, r0, #8
        lsrs    r3, r1, #8
        uxtb    r2, r2
        uxtb    r3, r3

might be better coded as:

        ubfx    r2, r0, #8, #8
        ubfx    r3, r1, #8, #8

The code space is the same, since lsrs and uxtb are 16 bit instructions,
whereas ubfx is 32 bit, but I think the ubfx pair will execute in one clock,
whereas the first sequence will take two clocks (assuming a dual-issue CPU).

(But in a universe that contains 10^80 electrons, it really might not matter
that much.)
Quuxplusone commented 14 years ago

Attached ubfx.zip (3381 bytes, application/zip): minimal C++ program that demonstrates the case

Quuxplusone commented 13 years ago
Reduced IR:
target triple = "thumbv7-apple-darwin11"
define arm_aapcscc void @_Z7checkrxv(i32 %tmp15, i32 %tmp18,
                                     i32 %tmp42, i32 %tmp45) nounwind {
  %cmp22 = icmp eq i32 %tmp18, %tmp15
  %tmp117 = lshr i32 %tmp45, 8
  %tmp118 = trunc i32 %tmp117 to i8
  %tmp109 = lshr i32 %tmp42, 8
  %tmp110 = trunc i32 %tmp109 to i8
  %cmp51 = icmp eq i8 %tmp118, %tmp110
  %and5687 = and i1 %cmp22, %cmp51
  br i1 %and5687, label %if.then81, label %if.end85
if.then81:
  tail call arm_aapcscc void @_Z5myLogj(i32 252) nounwind
  br label %if.end85
if.end85:
  ret void
}
declare arm_aapcscc void @_Z5myLogj(i32)

Not sure why the lsrs+uxtb isn't getting matched into ubfx.