Open derekbruening opened 8 years ago
\ TODO on amd64 today tools rely on DR mangling pc-rel instrs, which could fail
If we have mangle_rel_addr() only operate on app instrs, these all fail:
debug-internal-64: 219 tests passed, **** 5 tests failed: ****
code_api|client.count-ctis
code_api|client.count-ctis-noopt
code_api|sample.modxfer
code_api|sample.modxfer_app2lib
code_api|sample.instrcalls
<Application /work/dr/build_suite/build_debug-internal-64/suite/tests/bin/client.count-ctis (5413) DynamoRIO usage error : encode error: rip-relative reference out of 32-bit reach>
For count-ctis-noopt:
interp: start_pc = 0x00007f4b1f4c8aa0
0x00007f4b1f4c8aa0 ff 25 22 05 22 00 jmp <rel> 0x00007f4b1f6e8fc8[8byte]
bb ilist before mangling:
TAG 0x00007f4b1f4c8aa0
+0 m4 @0x000000004e78b120 65 48 89 0c 25 18 00 mov %rcx -> %gs:0x18[8byte]
00 00
+9 m4 @0x000000004e78b1a0 48 8b 0d c1 e5 fc d0 mov <rel> 0x00007f4b1f6e8fc8[8byte] -> %rcx
It relies on DR re-relativizing the app's PC-rel mem ref to pass as an arg to a clean call.
This could fail if the cache is not 32-bit-reachable from the app code (which is not guaranteed).
For now I'll put back the mangling of meta instrs for x86 to maintain the status quo (on ARM the reach is too small and we'll assert on almost anything) but this remains an issue.
This kind of utility would also simplify handling app memrefs involving the stolen register on ARM, which is up to the client to properly handle and is kind of a pain.
Note that using drutil_insert_get_mem_addr() and loading from that address does solve the corner cases for a client. drutil_insert_get_mem_value() might just call drutil_insert_get_mem_addr() to handle the far and pc-rel corner cases, and then load from the address into a passed-in scratch reg.
While the utility routine will be a convenience, this issue is about more than just a utility routine, as A) clients won't know to use it: it's very natural to just take the memref opnd; B) it's awkard to use in places like a clean call arg: it takes multiple steps and explicit scratch regs. We're hoping for a solution to clients using memref opnds directly. Can we make that simpler, or at the least can we detect the app memref and raise a warning plus augment the clean call docs to warn about such things. There may not be a simple solution and it may need a fancier feature like #510 that allows chaining the call to get the mem value directly into the clean call.
** TODO too hard to make library routine for post-app-inst instru like this!
Inserting instru after the current app instr messes up drreg.
Having a callback where the library routine waits until the next instr and then calls the tool: hard to combine w/ tool's other instru.
We discussed this offline a little bit. It seems relatively straightforward for a client to implement instrumentation to get the stored value:
1) In the insert phase for the target store, get the store address into a register. Set some user_data state to tell the next step to do some work.
2) In the insert phase for after the target store, since the user_data state is set, go and load from the register to get the store value. Use it in the desired manner.
A call has to be specially handled, as does a non-branch bb-final store (if bb is truncated, e.g.).
Where it gets hard is making a library routine to do this, as the whole framework is based on inserting instrumentation and then acting on it immediately. Maybe it's just not worth making a library routine, and we make a sample instead showing how to do it?
If we really want a library routine, one approach not mentioned previously is to duplicate the store operation. Emulating would be a pain, but re-executing a copy of the instruction could work. Any non-copy operation must have a register as the other operand, and a copy is easy to emulate. Get the address, load the value there, and then execute a copy of the store.
E.g.:
L3 add [eax,esi,4], ecx
=>
m4 <spill edx if not dead>
m4 lea edx, [eax,esi,4] // result of drutil_get_mem_addr
m4 mov edx, [edx]
m4 add edx, ecx // "emulate" using the processor
m4 <to-be-stored value is now in edx: act on it>
m4 <restore edx if not dead>
L3 add [eax,esi,4], ecx
It can be optimized of course but that's the idea.
72be5de seems to have broken client.drx_buf-test on ARM and AArch64.
@egrimley Hmm that's odd, I tried on my raspberry pi (so ARM, not AArch64) and everything seemed to work fine. I'll take a closer look to make sure it really does work on ARM. However, I don't have any AArtch64 hardware, would you be able to take a look at that?
@egrimley could you point at the CDash failure to make it easier to see the details of how it failed?
I'll ask what's happening about CDash.
With client.drx_buf-test, on ARM I see an infinite loop, while on AArch64 I see a CHECK failure in event_exit: num_faults is 202 instead of 204. But perhaps that's expected: there's some "NYI on AArch64" in 72be5de.
The test passes on Debian armhf jessie, but fails on Debian armhf stretch, where it looks as though test_asm_123 (as instrumented) does something that makes pthread_join wait for ever.
I'll do more research on the ARM failure (I actually didn't modify test_asm_123
in my commit so I'm wondering what's up with that). As for the AArch64 failure, I think the following patch should fix it:
diff --git a/suite/tests/client-interface/drx_buf-test.dll.c b/suite/tests/client-interface/drx_buf-test.dll.c
index 89782e9..e1d928a 100644
--- a/suite/tests/client-interface/drx_buf-test.dll.c
+++ b/suite/tests/client-interface/drx_buf-test.dll.c
@@ -436,7 +436,7 @@ event_exit(void)
* because the callback is called on thread_exit(). Finally, two more for
* drx_buf_insert_buf_memcpy().
*/
- CHECK(num_faults == NUM_ITER * 2 + 2 + 2,
+ CHECK(num_faults == NUM_ITER * 2 + 2 + IF_AARCH64_ELSE(0, 2),
"the number of faults don't match up");
if (!drmgr_unregister_bb_insertion_event(event_app_instruction))
CHECK(false, "exit failed");
Of course I can't test this but would you be able to verify that this fixes the AArch64 test?
Yes, that patch fixes the AArch64 test. Which Raspberry Pi do you have?
I think I have the Raspberry Pi 2 model, which is running Debian Jessie.
Then perhaps you could try Debian Stretch, in a chroot or as an alternative boot.
Interesting: I chrooted using debootstrap to pull debian stretch, and I was able to replicate the failure in debian stretch -- natively. The drx_buf-test.c
program itself fails natively even without running it under dynamorio.
Even more puzzling is that when I copy that program out of the chroot and back into debian jessie, it doesn't crash! What is going on here?
Perhaps it's the different glibc. What does GDB say?
Looks like the problem was that on Debain Stretch, $r4
was assumed not clobbered, but in Debian Jessie it was restored correctly. The pthread handling code immediately dereferenced $r4
after test_thread_asm
quit. Saving them for each of the assembly tests fixed the crash.
So close: it was a difference in libpthread rather than in glibc.
Submitted a PR: GH-2193
You can't "push" a register on AArch64, and X4 doesn't need to be saved anyway. Pushing a single register on ARM does not conform to the calling convention, though it should be safe in a leaf function. Perhaps a simpler solution would be to use r12 instead of r4 on ARM. See: https://github.com/DynamoRIO/dynamorio/pull/2195
If you do find you need to save and restore registers on ARM/AArch64, take a look at SAVE_PRESERVED_REGS, added in f5adcfe.
I think this should be re-opened. The get-mem-value part is resolved as we provided a sample, after deciding a library routine had too many complexities. But we didn't use the sample's approach to update our other samples and tests that have problems with blindly copying far and pc-rel operands. We still have the hack that mangles meta instrs on x86 but not ARM; and in fact because of the IF_ARM
used we are now mangling meta rel-addr instrs on AArch64.
The tool copying an app memref opnd that is pc-relative, has an x86 segment, or on aarchxx uses the stolen register, and using it in a meta instr (e.g., an arg to a clean call for an indirect branch target).
This will normally only be if the tool wants the value stored at that memref address, because the tool should use drutil_insert_get_mem_addr() to get the address and it handles all these corner cases. (We proposed a drutil_insert_get_mem_value() but concluded it was quite difficult to create as a library routine and we made the memval sample instead.)
pc-relative: on encoding it may not reach (pretty common for code cache to not reach app for 64-bit) and it's too late there to insert extra instrs, so encoding fails. On x86 there is a nice assert explaining the exact problem; other arches may have vague "can't encode" asserts.
segment and stolen reg: these will encode but will silently produce the wrong value. (They might crash if the offset goes beyond DR's mmap.)
DR could identify app-main-segment uses in tool code and warn about them, but cannot easily distinguish app from tool uses of the stolen reg or the DR segment.
Mangling might add several additional instructions (e.g., a "movz;movk;movk" sequence to materialize an address into a register on aarch64) inside a tool sequence. Some tools might assume they have tight control over the exact instruction ordering, so this could confuse restore_state handlers or other code that walks the code cache sequence.
DR's mangling may use scratch registers and slots. DR's slot 0 is not exposed to tools, so if DR uses that, there should be no conflict.
For pc-relative mangling:
For the stolen register:
For segments:
Slots beyond slot 0 can overlap with slots used by tools or drreg. The DR use is local and self-contained. On a fault in the copied opnd: first DR would restore from the spilled slot; then the tool or drreg would likely also do a restore from the same slot, if it does a walk like drreg does and thinks DR's spill is its own. So it might "just work". If the DR restore went last that might be safer.
We need to ensure that DR will actually restore the value, though, as normally it ignores tool code. Instructions added by DR will be marked "our_mangling": so spills and restores and things like the "movz;movk;movk". But the instr with the app opnd will not be. We'll have to make sure the right thing happens; right now the translate code might complain.
pc-relative opnds are easy to identify and mangle, as are opnds using the app's segment.
But for the stolen reg or a DR-segment-using opnd, DR cannot easily distinguish its use in a tool instr from a tool accessing DR's scratch space.
We could add a flag to opnd_t saying "app opnd"?
Both Abhinav and I leaning toward DR mangling these tool opnds: b/c of compatibility issues and complexity if we get rid of today's mangling.
If it's too complex to fully cover the corner cases above, for a short-term solution for #5316 we could have drbbdup look for pc-rel and call instrlist_insert_mov_immed_ptrsz (or, just remove the IF_ARM(xxxx) and mangle as-is on arm).
"on ARM the reach is too small and we'll assert on almost anything"
Doesn't that mean we'd want the mangling enabled?
Also note that on ARM DR seems to decode into a base-disp opnd with DR_REG_PC as the base instead of a rel-addr: see the test I added in PR #5306.
Hello, excuse me, is there any way to obtain the access traces of PC, addr, size, rw, and memval, which is a combination of memtracex_x86.c and memval_simple. c.
Hello, excuse me, is there any way to obtain the access traces of PC, addr, size, rw, and memval, which is a combination of memtracex_x86.c and memval_simple. c.
Please use the users list https://groups.google.com/forum/#!forum/DynamoRIO-Users for questions.
Split from #1823
This seems like a more general problem than just an issue with dr_insert_mbr_instrumentation(). While often inserted code that de-references app memory is app code, this is not always the case, and we want to support tool code doing so. Similarly to the stolen register on ARM, we should at least document that the burden falls on clients to detect and handle far refs on Linux rather than blindly copying opnds. We should also add convenience routines if it is too awkward to work around this using regular API routines.
A similar problem exists with PC-relative memory references: if a tool blindly copies the operand, is DR supposed to try and mangle it, potentially spilling a register and maybe messing up the tool's instrumentation assumptions?
One proposal would be a drutil_insert_get_mem_value() routine in an extension that handles the corner cases.