Open derekbruening opened 9 years ago
From bruen...@google.com on July 10, 2011 13:53:03
Summary: drreg: provide permanently-stolen reg support?
From bruen...@google.com on March 09, 2012 06:51:05
xref other extensions from the master extension discussion: drutil ( issue #295 ), drwrap ( issue #296 ), drmgr ( issue #402 ), drsyscall ( https://code.google.com/p/drmemory/issues/detail?id=822 ), drcallstack ( https://code.google.com/p/drmemory/issues/detail?id=823 ), drmalloc ( https://code.google.com/p/drmemory/issues/detail?id=824 ), umbra ( https://code.google.com/p/drmemory/issues/detail?id=825 )
From bruen...@google.com on May 11, 2012 11:44:53
*\ INFO drmgr and drreg discussion :meeting_DrM_2012_1_25:
xref issue #164/PR 494720: add a post-instru pass that checks for errors in register preservation
drreg and drmgr discussion highlights:
*\ TODO drreg framework :meeting_DrM_2012_5_11:
how does client tell us lifetime of values in scratch registers? so drreg knows whether to let app insr clobber scratch reg or have to preserve scratch reg value. proposal: simple for now: either whole-bb reg is preserved across whole bb, or its lifetime is just between app instrs. for locally-requested spills, lifetime ends at next app instr.
API to request whole-bb reg during analysis phase:
API to request app value to be restored (for OP_lea for shadow map, e.g.)
but want more than just pick best regs and keep app values in spill slots for proper restoring: also want:
have -conservative option for whether to use dead regs (can turn on if worried about fault or debugger examining)
for using Extensions:
Summary: drreg: register spilling mediation framework
Labels: -Priority-Low Priority-Medium
From bruen...@google.com on June 14, 2012 10:31:15
xref issue #52 , issue #53
From bruen...@google.com on July 15, 2013 13:52:20
pasting more notes:
\ TODO later thoughts
virtual regs would be by far the easiest to use. however, they require a reg allocation pass that may need to build CFG => extra overhead that not every tool will want to live with. so should there be 2 different interfaces? one that uses virtual regs and one that doesn't?
drmem is not linear: for check_ignore_unaddr mem2mem, jmps down below and then can jump back up to check_ignore_resume, so reg allocator would have to build CFG and not just do linear scan. but, that should all go away w/ -replace_malloc, if want to assume that will be long-term soln. update: actually it's not clear it can go away.
\ TODO revised interface: no up-front whole-bb, just simple reserve+unreserve interface
Originally I had these in drreg_options_t:
/**
* The number of scratch registers that need to be reserved across
* more than one application instruction. drreg turns these into
* "whole-basic-block" scratch registers.
*/
uint num_whole_bb;
/**
* Whether to spill the arithmetic flags across each basic block,
* to minimize per-instruction spills and restores.
*/
bool aflags_whole_bb;
The only reason to hardcode the number of whole-bb regs up front, and to spill them at the very top of the bb and restore at the very bottom, is to simplify state xl8:
/* Our state restoration model: Only whole-bb scratch regs and aflags
* need to be restored (i.e., all local scratch regs are restored
* before any app instr). For each such reg or aflags, we guarantee
* that either the app value is in TLS at each app instr (where fault
* might happen) or the app value is dead and it's ok to have garbage
* in TLS b/c the app will write it before reading (this is all modulo
* the app's own fault handler going off on a different path (xref
* DRi#400): so we're slightly risky here).
*/
From a pure user interface point of view, drreg would be much simpler if it had no concept of global/whole-bb vs local, and instead for whichever regs are still reserved in crossing an app instr (e.g., drmem's shadow xl8 reg), drreg restores or updates if the app reads or writes.
Xref discussion above about the original interface of drreg just picking whole-bb regs and the client having to parcel out who is using which when: this simpler interface here is much nicer to use.
If too many regs are reserved across an app instr, we may run out of places to store the values.
To handle fault xl8, we can decode the in-cache bb and walk forward looking at spills + restores like DR xl8 does. We'll add dr_is_tls_access() query to DR (looks for both raw and DR TLS).
Might still want "local" hints so drreg saves less-used-in-bb regs for non-local requests? Plus, if > raw slots and into DR slots, not supposed to use across app instrs, so complains if non-"local" for those?
Would any unreserved regs warrant keeping spilled across app instrs? What is perf diff between lazy restore at next app use vs keep spilled and update spill location at next app use? Identical for app read I guess; app write is restore + real unreserve vs re-spill to tls slot. So seems the answer is no: fixed # of whole-bb regs is a perf cost.
Xref #1771
\ TODO add lazy restore of aflags
consecutive drx_insert_counter_update() (after #1771 adds drxmgr test using drreg in the counter routine):
pre:
> bin32/drrun -loglevel 4 -c suite/tests/bin/libclient.drxmgr-test.dll.so -- suite/tests/bin/common.eflags
> head -2000 `ls -1td logs/*0|head -1`/l* | grep -A 20 ', tag'
Fragment 1, tag 0xf7774a70, flags 0x1000030, shared, size 70:
0x49765004 9f lahf -> %ah
0x49765005 0f 90 c0 seto -> %al
0x49765008 64 a3 4c 00 00 00 mov %eax -> %fs:0x4c[4byte]
0x4976500e 81 05 08 20 77 f7 01 add $0x00000001 0xf7772008[4byte] -> 0xf7772008[4byte]
00 00 00
0x49765018 64 a1 4c 00 00 00 mov %fs:0x4c[4byte] -> %eax
0x4976501e 04 7f add $0x7f %al -> %al
0x49765020 9e sahf %ah
0x49765021 9f lahf -> %ah
0x49765022 0f 90 c0 seto -> %al
0x49765025 64 a3 4c 00 00 00 mov %eax -> %fs:0x4c[4byte]
0x4976502b 81 05 0c 20 77 f7 03 add $0x00000003 0xf777200c[4byte] -> 0xf777200c[4byte]
00 00 00
0x49765035 64 a1 4c 00 00 00 mov %fs:0x4c[4byte] -> %eax
0x4976503b 04 7f add $0x7f %al -> %al
0x4976503d 9e sahf %ah
0x4976503e 89 e0 mov %esp -> %eax
0x49765040 68 77 4a 77 f7 push $0xf7774a77 %esp -> %esp 0xfffffffc(%esp)[4byte]
0x49765045 e9 d2 ff 01 00 jmp $0x4978501c
post:
> head -2000 `ls -1td logs/*0|head -1`/l* | grep -A 20 ', tag'
Fragment 1, tag 0xf7786a70, flags 0x1000030, shared, size 63:
0xef7c1005 9f lahf -> %ah
0xef7c1006 0f 90 c0 seto -> %al
0xef7c1009 64 a3 4c 00 00 00 mov %eax -> %fs:0x4c[4byte]
0xef7c100f 81 05 08 40 78 f7 01 add $0x00000001 0xf7784008[4byte] -> 0xf7784008[4byte]
00 00 00
0xef7c1019 81 05 0c 40 78 f7 03 add $0x00000003 0xf778400c[4byte] -> 0xf778400c[4byte]
00 00 00
0xef7c1023 89 e0 mov %esp -> %eax
0xef7c1025 64 a3 50 00 00 00 mov %eax -> %fs:0x50[4byte]
0xef7c102b 64 a1 4c 00 00 00 mov %fs:0x4c[4byte] -> %eax
0xef7c1031 04 7f add $0x7f %al -> %al
0xef7c1033 9e sahf %ah
0xef7c1034 64 a1 50 00 00 00 mov %fs:0x50[4byte] -> %eax
0xef7c103a 68 77 6a 78 f7 push $0xf7786a77 %esp -> %esp 0xfffffffc(%esp)[4byte]
0xef7c103f e9 d8 ff 01 00 jmp $0xef7e101c
Adding to the drreg test:
interp: start_pc = 0x08048f0b
0x08048f0b ba f4 f1 00 00 mov $0x0000f1f4 -> %edx
0x08048f10 ba f4 f1 00 00 mov $0x0000f1f4 -> %edx
0x08048f15 0f 95 c0 setnz -> %al
reads flag before writing it!
0x08048f18 39 e2 cmp %edx %esp
wrote overflow flag before reading it!
0x08048f1a eb 00 jmp $0x08048f1c
interp: direct jump at 0x08048f1a
end_pc = 0x08048f1c
instrument_basic_block ******************
before instrumentation:
TAG 0x08048f0b
+0 L3 ba f4 f1 00 00 mov $0x0000f1f4 -> %edx
+5 L3 ba f4 f1 00 00 mov $0x0000f1f4 -> %edx
+10 L3 0f 95 c0 setnz -> %al
+13 L3 39 e2 cmp %edx %esp
+15 L3 eb 00 jmp $0x08048f1c
END 0x08048f0b
drreg test #4
drreg test #4
drreg test #4
drreg_reserve_aflags @0x00000000: spilling aflags
drreg test #4
drreg_event_bb_insert_late @0x08048f15 aflags=0x8: lazily restoring aflags
drreg test #4
drreg_event_bb_insert_late @0x08048f18: re-spilling aflags after app write
drreg test #4
drreg_event_bb_insert_late @0x08048f1a aflags=0x11f: lazily restoring aflags
after instrumentation:
TAG 0x08048f0b
+0 L3 ba f4 f1 00 00 mov $0x0000f1f4 -> %edx
+5 L3 ba f4 f1 00 00 mov $0x0000f1f4 -> %edx
+10 m4 @0xef85d31c 64 a3 50 00 00 00 mov %eax -> %fs:0x00000050[4byte]
+16 m4 @0xef7ca0a4 9f lahf -> %ah
+17 m4 @0xef85ed1c 64 a3 4c 00 00 00 mov %eax -> %fs:0x0000004c[4byte]
+23 m4 @0xef7c9604 64 a1 50 00 00 00 mov %fs:0x00000050[4byte] -> %eax
+29 m4 @0xef85df04 3d 00 00 00 00 cmp %eax $0x00000000
+34 m4 @0xef7cb9e4 <label>
+34 m4 @0xef85d4a4 64 a3 50 00 00 00 mov %eax -> %fs:0x00000050[4byte]
+40 m4 @0xef7cddf4 64 a1 4c 00 00 00 mov %fs:0x0000004c[4byte] -> %eax
+46 m4 @0xef85d0bc 9e sahf %ah
+47 m4 @0xef7c9538 64 a1 50 00 00 00 mov %fs:0x00000050[4byte] -> %eax
+53 L3 0f 95 c0 setnz -> %al
+56 m4 @0xef7ceb4c 64 a3 50 00 00 00 mov %eax -> %fs:0x00000050[4byte]
+62 m4 @0xef7c9fd8 9f lahf -> %ah
+63 m4 @0xef7cb37c 64 a3 4c 00 00 00 mov %eax -> %fs:0x0000004c[4byte]
+69 m4 @0xef7cd05c 64 a1 50 00 00 00 mov %fs:0x00000050[4byte] -> %eax
+75 L3 39 e2 cmp %edx %esp
+77 m4 @0xef7cc8f0 64 a3 50 00 00 00 mov %eax -> %fs:0x00000050[4byte]
+83 m4 @0xef7cce48 64 a1 4c 00 00 00 mov %fs:0x0000004c[4byte] -> %eax
+89 m4 @0xef85d5d4 04 7f add $0x7f %al -> %al
+91 m4 @0xef7ce710 9e sahf %ah
+92 m4 @0xef85fd04 64 a1 50 00 00 00 mov %fs:0x00000050[4byte] -> %eax
+98 L3 eb 00 jmp $0x08048f1c
END 0x08048f0b
\ TODO support use in shared gencode outside of bb event, and in instru2instru phase
DrMem's pattern mode wants to save aflags w/ liveness analysis in the instru2instru phase (in pattern_instrument_repstr()).
Maybe for this pattern case we can add flags saving in insert phase and in instru2instru just move flags code around? But that could mess up the liveness assumptions.
Xref the non-drmgr API discussion about a parallel set of routines: drregi_*. But can we avoid whole separate routines? We can tell whether in drmgr insert phase w/o extra arg that breaks compat -- but where is liveness info stored?
We could keep the same API signatures if we store liveness in pt, just like we're doing w/ drmgr. We'd just add sthg like drreg_analyze_instrlist()? But what about the current index? Store instr ptrs with live info and search for instr? OTOH, should we not assume that the instrlist is static since the analysis in instru2instru?
Or, we add nothing extra in the API, and liveness is computed on the spot in each routine, if called not during the drmgr insert phase? We could at least start w/ that, and add explicit storage via new API routine later as an optimization w/o breaking compatibility.
Gencode support seems a subset: the user probably wants to spill while appending, and thus there are no further instrs, and thus everything is live.
drreg is complete enough that we should be able to close this soon. With 00d39c7 in place, it now matches the efficiency of the hand-coded spilling in the samples and as part of #1273 I was able to convert all the remaining samples to use drreg. I'm going to wait until the Dr. Memory port to drreg is complete as a few issues remain there that might require additions or possibly changes to the interface.
d749f93 i#511 drreg: initial framework, liveness analysis, and aflags implementation 97393af i#511 drreg: register reservation 2883a1f i#511 drreg: register reservation: implement drreg_get_app_value 942fd5c i#511 drreg: support reserved regs across app instrs 4b16998 i#511 drreg: lazy restore of GPR regs 58e8582 i#511 drreg: export instr_is_reg_spill_or_restore() bf99438 i#511 drreg: add fault handling 7e39d3d i#511 drreg: handle labels added by other components 998070a i#511 drreg: mark as experimental ee3ad18 i#511 drreg: add vector convenience routines 513bcdb i#511 drreg: lazily restore aflags 1f4378c i#511 drreg: restore lazy aflags on a fault 1367094 i#511 drreg: add drreg_reservation_info() 83833a9 i#511 drreg: advance priorities outside of Dr. Memory ranges 7dd5b31 i#511 drreg: add error handling callback 71bbc5b i#511 drreg: add support for aflags preservation outside insert phase 8ee8d81 i#511 drreg: add support for register preservation outside insert phase 23f2abe i#511 drreg: add drreg_is_register_dead() 2469668 i#511 drreg: add drreg_reserve_dead_register() ac8d95e i#1273 drmgr, i#511 drreg: convert countcalls to use drmgr + drreg b2c9aea i#511 drreg: convert drcachesim to use drreg 00d39c7 i#511 drreg: keep aflags in eax where possible
Adding a note here to remember to remove the "work in progress...interface may be in flux" from drreg.dox when closing this.
Another note: the drreg barrier needed when invoking things like dr_insert_mbr_instrumentation() needs to be better documented.
One annoying thing hit when converting the samples was that a sample not using drreg but using drx_insert_counter_increment and drmgr was forced to init drreg on its own. Perhaps we can solve this by having drx_init call drreg_init but in a "weak" way so that any pre-existing or later call overrides it?
359f8ee i#511 drreg: document lazy restore "barriers" 583b2fc i#511 drreg: simplify usage by combining multiple inits
Xref #1963
Are there any more available notes on the support of virtual registers by any chance? In particular, I am wondering whether the allocation of physical registers would be done in the final phase related to Instrumentation-to-instrumentation transformations? Moreover, how would DynamoRIO reason over virtual registers in previous stages? I assume there needs to be some new class of special registers that the IR may handle?
A virtual register feature was never implemented and never got far enough to have a detailed design. If you were interested in adding such a feature I would suggest filing a separate issue and writing some kind of design proposal.
From bruen...@google.com on July 10, 2011 16:52:53
we plan to build a "drreg" extension that provides register liveness analysis and stealing. one thing we didn't plan to do but we may want to is to add a feature to permanently steal a register. when only need access to a few fields though, directly-addressable TLS should be more performant: but when have a lot of fields a stolen register could be more efficient.
Original issue: http://code.google.com/p/dynamorio/issues/detail?id=511