Granary / granary2

Dynamic binary translation framework for instrumenting x86-64 user space Linux programs
MIT License
39 stars 5 forks source link

Move `IndirectEdge` template code to a different code cache region. #73

Closed pgoodman closed 9 years ago

pgoodman commented 9 years ago

This would be nice, and could be the first step towards a being able to split code, at the granularity of fragments, across different regions of the code cache.

pgoodman commented 9 years ago

For those unaware, the current state of things is that indirect edge template code immediately follows the basic block. Here's the gist of things:

         block
           |
         in_edge ----.-> go_to_granary
           |         |       |
    compare_target --' <-----'
           |
      exit_to_block

In the below disassembly snippet, I've labelled the various parts. It should be pretty noticeable that this is quite heavy weight, and represents "garbage" in the code cache.

Native Code

   0x7f256d87f143 <call_init+163 at dl-init.c:80>:  pop    rbx
   0x7f256d87f144 <call_init+164 at dl-init.c:80>:  pop    rbp
   0x7f256d87f145 <call_init+165 at dl-init.c:80>:  pop    r12
   0x7f256d87f147 <call_init+167 at dl-init.c:80>:  pop    r13
   0x7f256d87f149 <call_init+169 at dl-init.c:80>:  pop    r14
   0x7f256d87f14b <call_init+171 at dl-init.c:80>:  ret

Instrumented Code

block:
   0x41d3106d:  lea    rsp,[rsp-0x88]
   0x41d31075:  mov    rbx,QWORD PTR [rsp+0x88]
   0x41d3107d:  mov    rbp,QWORD PTR [rsp+0x90]
   0x41d31085:  mov    r12,QWORD PTR [rsp+0x98]
   0x41d3108d:  mov    r13,QWORD PTR [rsp+0xa0]
   0x41d31095:  mov    r14,QWORD PTR [rsp+0xa8]
in_edge:
   0x41d3109d:  mov    QWORD PTR [rsp],r15
   0x41d310a1:  mov    r15,QWORD PTR [rsp+0xb0]
   0x41d310a9:  push   rcx
   0x41d310aa:  push   rdi
   0x41d310ab:  movabs rdi,0x40c49008   <-- IndirectEdge *
   0x41d310b5:  jmp    QWORD PTR [rdi]  <-- initially goes to next instruction.
go_to_granary: 
   0x41d310b7:  mov    rcx,r15
   0x41d310ba:  jmp    0x44531030
--- start of template ---
   0x41d310bf:  ud2    
compare_target:
  <not emitted: move negated target address into rcx>
   0x41d310c1:  lea    rcx,[rcx+r15*1]
   0x41d310c5:  jrcxz  0x41d310cc
   0x41d310c7:  jmp    0x41d310b7
   0x41d310cc:  pop    rdi
   0x41d310cd:  pop    rcx
   0x41d310ce:  mov    r15,QWORD PTR [rsp]
   0x41d310d2:  lea    rsp,[rsp+0xb8]
  <at instantiation: fall-through exit_to_block>
   0x41d310da:  ud2 
--- end of template ---
pgoodman commented 9 years ago

Eventually, the "edge cache" shouldn't exist. There should simply be a hierarchy of increasingly "cold" caches.

pgoodman commented 9 years ago

I think when direct edges are eventually patched, we should also patch indirect patches to potentially change to hash tables or something else.

pgoodman commented 9 years ago

Another thing related to other code regions is that I think "slower" regions should be at a higher address than hotter regions. That way, jumps to colder code are always forward jumps (means that branch predictor follows fall-through first, I think.. double check the software optimization manual on this).

pgoodman commented 9 years ago

The MangleIndirectCall function in 1_mangle.cc and the corresponding need for a return address label should be eliminated with this.

pgoodman commented 9 years ago

Will be done in the next commit to the master branch.