Granary / granary2

Dynamic binary translation framework for instrumenting x86-64 user space Linux programs
MIT License
39 stars 5 forks source link

Move `IndirectEdge` template code to a different code cache region. #73

Closed pgoodman closed 9 years ago

pgoodman commented 9 years ago

This would be nice, and could be the first step towards a being able to split code, at the granularity of fragments, across different regions of the code cache.

pgoodman commented 9 years ago

For those unaware, the current state of things is that indirect edge template code immediately follows the basic block. Here's the gist of things:

         in_edge ----.-> go_to_granary
           |         |       |
    compare_target --' <-----'

In the below disassembly snippet, I've labelled the various parts. It should be pretty noticeable that this is quite heavy weight, and represents "garbage" in the code cache.

Native Code

   0x7f256d87f143 <call_init+163 at dl-init.c:80>:  pop    rbx
   0x7f256d87f144 <call_init+164 at dl-init.c:80>:  pop    rbp
   0x7f256d87f145 <call_init+165 at dl-init.c:80>:  pop    r12
   0x7f256d87f147 <call_init+167 at dl-init.c:80>:  pop    r13
   0x7f256d87f149 <call_init+169 at dl-init.c:80>:  pop    r14
   0x7f256d87f14b <call_init+171 at dl-init.c:80>:  ret

Instrumented Code

   0x41d3106d:  lea    rsp,[rsp-0x88]
   0x41d31075:  mov    rbx,QWORD PTR [rsp+0x88]
   0x41d3107d:  mov    rbp,QWORD PTR [rsp+0x90]
   0x41d31085:  mov    r12,QWORD PTR [rsp+0x98]
   0x41d3108d:  mov    r13,QWORD PTR [rsp+0xa0]
   0x41d31095:  mov    r14,QWORD PTR [rsp+0xa8]
   0x41d3109d:  mov    QWORD PTR [rsp],r15
   0x41d310a1:  mov    r15,QWORD PTR [rsp+0xb0]
   0x41d310a9:  push   rcx
   0x41d310aa:  push   rdi
   0x41d310ab:  movabs rdi,0x40c49008   <-- IndirectEdge *
   0x41d310b5:  jmp    QWORD PTR [rdi]  <-- initially goes to next instruction.
   0x41d310b7:  mov    rcx,r15
   0x41d310ba:  jmp    0x44531030
--- start of template ---
   0x41d310bf:  ud2    
  <not emitted: move negated target address into rcx>
   0x41d310c1:  lea    rcx,[rcx+r15*1]
   0x41d310c5:  jrcxz  0x41d310cc
   0x41d310c7:  jmp    0x41d310b7
   0x41d310cc:  pop    rdi
   0x41d310cd:  pop    rcx
   0x41d310ce:  mov    r15,QWORD PTR [rsp]
   0x41d310d2:  lea    rsp,[rsp+0xb8]
  <at instantiation: fall-through exit_to_block>
   0x41d310da:  ud2 
--- end of template ---
pgoodman commented 9 years ago

Eventually, the "edge cache" shouldn't exist. There should simply be a hierarchy of increasingly "cold" caches.

pgoodman commented 9 years ago

I think when direct edges are eventually patched, we should also patch indirect patches to potentially change to hash tables or something else.

pgoodman commented 9 years ago

Another thing related to other code regions is that I think "slower" regions should be at a higher address than hotter regions. That way, jumps to colder code are always forward jumps (means that branch predictor follows fall-through first, I think.. double check the software optimization manual on this).

pgoodman commented 9 years ago

The MangleIndirectCall function in and the corresponding need for a return address label should be eliminated with this.

pgoodman commented 9 years ago

Will be done in the next commit to the master branch.