DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.66k stars 562 forks source link

for 32-bit app on 64-bit kernel, switch to 64-bit mode and use extra registers in DR and tool #751

Open derekbruening opened 9 years ago

derekbruening commented 9 years ago

From bruen...@google.com on April 24, 2012 11:22:34

I've long wanted to work on various projects involving switching modes, whether for the app's benefit or the tool's, but so far have not had time to work on any of them.

the idea here is that, when running a 32-bit application on a 64-bit kernel under a DynamoRIO tool, we can switch to 64-bit mode and use the extra registers as scratch space to reduce spills and improve performance.

we should consider using the registers for core DR for ibl, and also make them available to the tool

we'd have to mangle instructions that are not legal in x64 mode. some just need a re-encoding (e.g., 1-byte inc/dec) while others will be more complex (pusha, BCD instrs, lds, etc.) and it may be simpler to swap back to x86 mode rather than try to emulate some of them.

also have to be careful of instrs whose default operand size changes based on mode. most problematic and common will be push/pop which will likely have to all be converted to store/load (multiple if mem arg).

need to ensure fault handling is done properly regardless of mode

xref issue #49 : simultaneous 32-bit and 64-bit app code support

xref my prior proposals about 32-to-64 for app code for supporting 32-bit legacy plugins in 64-bit apps

xref "Dynamic Register Promotion of Stack Variables" in CGO 2011 32-to-64 to optimize app by getting app stack refs into regs

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=751

derekbruening commented 9 years ago

From bruen...@google.com on June 12, 2012 10:36:48

pasting in some notes which are mostly me talking to myself but hopefully these are readable:

\ TODO impl notes *\ TODO 64-bit DR or 64-bit capabilities in 32-bit DR?

simplest and cleanest to use 64-bit DR for decode/encode

*\ TODO linux or windows? windows is easier

much easier to have 64-bit lib loaded in windows than to go implement our own linux loader (ld.so won't load our 64-bit lib into 32-bit app)

*\ TODO transitions between modes

on windows can use the already-set-up 64-bit code segment

kernel exposes GDT and LDT slots so should be able to create a descriptor on linux but more work

*\ TODO existing mixed-mode support

xref already-existing support for mixing 32-bit and 64-bit:

for this we don't need to support app w/ mixed code (that's issue #49 ).

*\ TODO x64-incompatible segment operations

push/pop of cs/ds/es/ss segments segments in general are all flat: most apps don't care. ones that do: could just bail on this xformation? if hit unsupported, leave existing frags as x64 and bail on rest? leave some as x86 and try others later that don't have segment ops in them?

*\ TODO injection

don't start from 32-bit DR in process: just have 64-bit DR and add support for 32-bit app (xref issue #49 )

just use x64 drinject and give it a 32-bit process? follow children: treat like 64-to-64, and have DR in child do delay hook on 32-bit ntdll.

or, use 32-bit drinject and solve cross-arch inject => issue #803 though follow children will be 64-to-64

*\ TODO once 64-bit DR loaded

ntdll64 already in process, so DR will initialize normally.

*\ TODO mcontext

64-bit data struct but put in 32-bit values

*\ TODO wow64 layer

don't try to solve issue #49 : leave wow64 native

*\ TODO syscalls: just skip the wow64 far call so may actually be faster!

*\ TODO Ki: need to hook 32-bit ntdll Ki

I'm assuming kernel talks to 64-bit wow64 layer, always, regardless of the current processor mode. one question is: will the wow64 layer, on a fault while in 64-bit mode imposed by DR, still go to 32-bit ntdll Ki? and if so, will DR have any situations where it can't recover or handle its own deliberate faults or something b/c it doesn't have 64-bit state?

*\ TODO gencode

generate 32-bit ibl and cxt switch. add far jmp to cxt sw to support 32-bit for incremental work or if want to bail on translating certain app instructions.

derekbruening commented 9 years ago

From bruen...@google.com on June 12, 2012 10:37:13

Owner: ya...@google.com

derekbruening commented 9 years ago

From bruen...@google.com on June 22, 2012 13:50:31

pasting from issue #828 comment 1:

If we get the 32to64 translation working well, here's a strawman proposal for how things could work. When doing mixed mode instrumentation:

This way, if we can translate everything, we can stay in x64 mode the entire time. The client can insert whatever code it likes, x64 or x86, we'll translate and preserve the semantics.

derekbruening commented 9 years ago

From bruen...@google.com on June 22, 2012 15:39:02

one concern w/ (or maybe just addition to) the proposal in comment 3 is that there may be instrs we never support translating and leave 32-bit (e.g., BCD). we may need to communicate to the client then that such a fragment will remain entirely 32-bit and can't accept any 64-bit instru.

so this proposal is to just let the client use r8 - r15 as it sees fit, rather than have some spill slot API extension (or just behind the scenes impl) that maps existing slots to registers or something, which is an alternative but is much less flexible for the client.

derekbruening commented 9 years ago

From bruen...@google.com on June 28, 2012 08:23:05

\ TODO speed up ibl, in-trace cmp, exit stubs by DR using 64-bit registers

simplest to have static partition registers among DR components and client

translator using r8 any overlap bet ibl and translator? no: translator's uses of r8 are all local, and ind branch fault should happen before ibl.

mangler uses r9 and r10 s/xcx/r9x/ not worth taking another just for selfmod: though could use r11 ibl uses r8 - r10 . rcx is in r9 , xchg r8 w/ rax for flags, use r10 for other scratch.

in-trace cmp: ecx + flags => r9 and r10 exit stubs: may as well use r10 (convention coming out as r9 ==xcx, r10 ==xax)

client can then have r11 - r15 rip-rel far ind call will use 3 regs: but no rip-rel in 32-bit

*\ TODO have DR's use of x64 regs be optional

probably better for drmem perf to have drmem use r9 - r10 : bigger win to keep shadow regs or other key data in real regs than to improve DR's ibl. so have it under runtime option.

derekbruening commented 9 years ago

From bruen...@google.com on June 28, 2012 08:25:45

update: in-trace cmp: ecx => r9 (flags only for x64 cmp)

derekbruening commented 9 years ago

From bruen...@google.com on March 10, 2014 08:56:02

Owner: ---

derekbruening commented 9 years ago

From bruen...@google.com on April 22, 2014 10:10:09

xref WOW64 complications pointed out in issue #979 : "WOW64 layer assumes r12 - r15 are untouched in between syscalls"