facebookarchive / BOLT

Binary Optimization and Layout Tool - A linux command-line utility used for optimizing performance of binaries
2.51k stars 177 forks source link

BOLT/LLVM? does not preserve prefixes on conditional branches #294

Open suresh-srinivas opened 2 years ago

suresh-srinivas commented 2 years ago

Discussing with @maksfb it looks like it is similar to issue https://reviews.llvm.org/D120592

I have an input binary of the form

0000000000401169 <main>:
  401169:       89 f8                   mov    %edi,%eax
  40116b:       83 ff 01                cmp    $0x1,%edi
  40116e:       2e 74 06                je,pn  401177 <main+0xe>
  401171:       83 ff 02                cmp    $0x2,%edi
  401174:       2e 75 01                jne,pn 401178 <main+0xf>
  401177:       c3                      retq   
  401178:       83 ff 03                cmp    $0x3,%edi
  40117b:       2e 74 f9                je,pn  401177 <main+0xe>
  40117e:       b8 04 00 00 00          mov    $0x4,%eax
  401183:       eb f2                   jmp    401177 <main+0xe>
  401185:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40118c:       00 00 00 

It has 3 conditional branches with a prefix of 2e. This is the output after BOLTing. The 3 conditional branches don't have the prefixes anymore.

0000000000401169 <main>:
  401169:       89 f8                   mov    %edi,%eax
  40116b:       83 ff 01                cmp    $0x1,%edi
  40116e:       74 05                   je     401175 <main+0xc>
  401170:       83 ff 02                cmp    $0x2,%edi
  401173:       75 01                   jne    401176 <main+0xd>
  401175:       c3                      retq   
  401176:       83 ff 03                cmp    $0x3,%edi
  401179:       74 fa                   je     401175 <main+0xc>
  40117b:       b8 04 00 00 00          mov    $0x4,%eax
  401180:       eb f3                   jmp    401175 <main+0xc>
  401182:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  401189:       00 00 00 
  40118c:       0f 1f 40 00             nopl   0x0(%rax)

This looks like an underlying LLVM problem and not a BOLT problem? I noticed the llvm-objdump does not seem to know about the prefixes, compared to the regular objdump. @maksfb @rafaelauler @aaupov

0000000000401169 <main>:
  401169: 89 f8                         movl    %edi, %eax
  40116b: 83 ff 01                      cmpl    $1, %edi
  40116e: 2e 74 06                      je      0x401177 <main+0xe>
  401171: 83 ff 02                      cmpl    $2, %edi
  401174: 2e 75 01                      jne     0x401178 <main+0xf>
  401177: c3                            retq
  401178: 83 ff 03                      cmpl    $3, %edi
  40117b: 2e 74 f9                      je      0x401177 <main+0xe>
  40117e: b8 04 00 00 00                movl    $4, %eax
  401183: eb f2                         jmp     0x401177 <main+0xe>
  401185: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)
  40118f: 90                            nop

Thanks

--Suresh

aaupov commented 2 years ago

Yes, it looks similar to losing addr32 prefix: LLVM MC might be losing the prefix during disassembly, not setting it on MCInst.

On the other hand, these prediction prefixes are optional so we may want to strip them by default. What's the use case or perf effect here?

suresh-srinivas commented 2 years ago

Thanks @aaupov

This is ignored by the current processors. The static prediction is NT (Not Taken) and the prefix is ignored. They are also mostly not generated (except when using special compiler flags and likely/unlikely macros). So striping them by default would be right.

With or without the hint, the BPU is updated when the branch is taken.

We are doing some early research work to mark common conditional branches with 3E (so the branch instruction is predicted Taken by the CPU). We wanted to use LBR or conditional branch taken to collect profile and then use BOLT to apply the hint.

For this the following we will need the following

  1. The branch probability of the conditional branch (is there an easy way in BOLT to get to this or do we need to process the profile data to create this?)
  2. To add the 3E prefix for conditional branches that are mostly taken ( I presume that adding the prefix will require some of the LLVM support?)
  3. Write out the binary and preserve the 3E (looks like this is missing some LLVM support)

Appreciate any directions on this.

Thanks

--Suresh