Granary / granary2

Dynamic binary translation framework for instrumenting x86-64 user space Linux programs
MIT License
39 stars 5 forks source link

Make os::AnnotateAppInstruction handle `REP MOVS`. #19

Closed pgoodman closed 10 years ago

pgoodman commented 10 years ago

This comes up in the copy_user_enhanced_fast_string function A key challenge is that for some arbitrary REP MOVS-like instruction, it's not clear if the fault-able instruct is the source or destination memory operand.

A potentially reasonable solution is to test the source and destination operands independently; however, currently, Granary does not split up the REP MOVS type instructions into a loop form. This in itself would be challenging. There's a few approaches here:

  1. Introduce new basic blocks to handle the looping instructions. This has the challenge that it's not quite clear just how to make the deferred decoder smart enough to recognize this pattern.
  2. Introduce a bunch of branch instructions to handle the looping instructions. The challenge here is that copy_user_enhanced_fast_string doesn't allocate a stack frame, but upon faulting, it goes tail-calls another function. The challenge here would be to figure out an instruction sequence such that the virtual register system maintains the correct stack pointer through the tail-call / recovery code.
  3. Replace jumps/calls to copy_user_enhanced_fast_string with ones to copy_user_handle_tail.
  4. Implement a similar strategy as in granary1. Specifically, just assume that only one block contains a faulting address (or force this issue), then if that block faults, then perform and exception table search. This would require quite a bit of new data structures to handle.
  5. Go native at the offending instruction (current approach).

Relevant assembly:

Dump of assembler code for function copy_user_enhanced_fast_string:
   0xffffffff8136d2a0 <+0>: data32 xchg ax,ax
   0xffffffff8136d2a3 <+3>: and    edx,edx
   0xffffffff8136d2a5 <+5>: je     0xffffffff8136d2ab <copy_user_enhanced_fast_string+11>
   0xffffffff8136d2a7 <+7>: mov    ecx,edx
# Can fault here.
   0xffffffff8136d2a9 <+9>: rep movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
   0xffffffff8136d2ab <+11>:    xor    eax,eax
   0xffffffff8136d2ad <+13>:    data32 xchg ax,ax
   0xffffffff8136d2b0 <+16>:    ret 

# When faulted, control is re-directed to here. This is the equivalent of a tail-call
# to copy_user_handle_tail, where the current counter of the loop instruction is
# set to be the third argument of copy_user_handle_tail.
   0xffffffff817307a6 <bad_to_user+48>: mov    edx,ecx
   0xffffffff817307a8 <bad_to_user+50>: jmp    0xffffffff8136ec50 <copy_user_handle_tail>

(gdb) x/10i copy_user_handle_tail
   0xffffffff8136ec50 <copy_user_handle_tail>:  nop    DWORD PTR [rax+rax*1+0x0]
   0xffffffff8136ec55 <copy_user_handle_tail+5>:    push   rbp
   0xffffffff8136ec56 <copy_user_handle_tail+6>:    test   edx,edx
   0xffffffff8136ec58 <copy_user_handle_tail+8>:    mov    rbp,rsp
   0xffffffff8136ec5b <copy_user_handle_tail+11>:   je     0xffffffff8136ecd0 <copy_user_handle_tail+128>
   0xffffffff8136ec5d <copy_user_handle_tail+13>:   xor    r9d,r9d
   0xffffffff8136ec60 <copy_user_handle_tail+16>:   jmp    0xffffffff8136ec81 <copy_user_handle_tail+49>
   0xffffffff8136ec62 <copy_user_handle_tail+18>:   nop    WORD PTR [rax+rax*1+0x0]
   0xffffffff8136ec68 <copy_user_handle_tail+24>:   data32 xchg ax,ax
   0xffffffff8136ec6b <copy_user_handle_tail+27>:   mov    BYTE PTR [rdi],sil
  ...
pgoodman commented 10 years ago

Another alternative is to dynamically generate exception table entries, and make sure that the kernel can see them. It might be necessary in this case for a struct module to be allocated for the code cache itself, so that the kernel's search routine finds the dynamic entries.

One trickiness here would be keeping the dynamic entries sorted.

pgoodman commented 10 years ago

I think a nice way to potentially handle this might be to turn a rep movs type instruction into an expanded form like the one shown below. This could be done within arch/x86-64/early_mangle.cc.

label:                <LabelInstruction>
  mov ...             <app NativeInstruction>
  mov ...             <app NativeInstruction>
  loop* label         <inst BranchInstruction>
pgoodman commented 10 years ago

This was fixed by the recent overhaul of all kernel exception table-related code.