DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.63k stars 557 forks source link

support re-attach after full detach with the same DR library instance #2157

Open derekbruening opened 7 years ago

derekbruening commented 7 years ago

Split from #95

For something like a ptrace-based external attach with an injected DR library, the solution here would be to remove the library completely on detach, leaving no extra work for a re-attach. This issue covers instead a re-attach for a DR library that we cannot remove, as it is either statically linked with the app or was not loaded by us as part of the attach but rather by the system loader up front.

Xref discussion on needing to re-attach after a full detach for start/stop when stop always does a full detach: https://github.com/DynamoRIO/dynamorio/issues/95#issuecomment-64834854

It's worth repeating the main paragraph there:

Supporting re-takeover when stopping is tied to full cleanup is problematic as it requires that DR fully zero all static and global variables. There are many cases of static variables scattered around, such as inside DO_ONCE, in the initializers for Extensions (drmgr, etc.), in memoized functions, etc. We'd have to make all those non-global-scope static vars exposed to get access to them, or try to zero out the whole .data and .bss (which by itself is not enough as there's a lot of non-zero-init stuff in .data). This has performance impliciations for chains of short-lived processes. We also have to deal with subtle things like #1271, where we threw out the .1config file under the assumption that we wouldn't re-read the options later. Plus, even if we make DR work in this model, third-party Extensions are unlikely to follow this: we would have to noisily demand a different programming model than is usually assumed.

Despite all of those problems, in the past we have gotten such a re-attach to work for simple cases, and even if the solution is fragile and "hacky" and does not cover all corner cases it may still be worth best-effort support as it removes a severe limitation of useful usage scenarios such as bursty traces.

derekbruening commented 7 years ago

I put in initial best-effort support in 2dd9659

However, it ends up failing on Travis in tests that pass locally:

https://travis-ci.org/DynamoRIO/dynamorio/builds/201376528
debug-internal-32: 259 tests passed, **** 3 tests failed: ****
    code_api|tool.drcacheoff.burst_static =>    (16821).  Internal Error: DynamoRIO debug check failure: 
    code_api|tool.drcacheoff.burst_client =>    (16840).  Internal Error: DynamoRIO debug check failure: 
    code_api|api.static_detach =>  Application /home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/suite/tests/bin/api.static_detach (16921).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/os.c:8907 vsyscall_page_start == NULL 

I put a diagnostic into a pull request and:

https://travis-ci.org/DynamoRIO/dynamorio/jobs/201391836
254: Test command: /home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/bin32/runstats "-s" "90" "-killpg" "-silent" "-env" "LD_LIBRARY_PATH" "/home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/lib32/debug:/home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/ext/lib32/debug:" "-env" "DYNAMORIO_OPTIONS" "-stderr_mask 0xC -dumpcore_mask 0 -code_api" "/home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/suite/tests/bin/api.static_detach"
254: Test timeout computed to be: 600
252: pre-DR stop
252: all done
254: pre-DR init
254: vsyscall_page_start is 0x00000000
254: in dr_client_main
254: pre-DR start
254: pre-DR detach
254: Saw some bb events
254: clearing vsyscall_page_start
254: re-attach attempt
254: vsyscall_page_start is 0x00000000
254: vsyscall_page_start is 0xf77bb000
254: <Application /home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/suite/tests/bin/api.static_detach (17053).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/os.c:8909 vsyscall_page_start == NULL
254: (Error occurred @457 frags)
254: version 6.2.17211, custom build
254: -stderr_mask 12 -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct 
254: 0xffae96ec 0x08136695
254: 0xffae991c 0x082ea30c
254: 0xffae9a20 0x081d9d31
254: 0xffae9aac 0x080b2f1c
254: 0xffaea2e8 0x080b65ee
254: 0xffaea300 0x080b6875
254: 0xffaea310 0x08051c86
254: 0xffaea328 0xf75c7ad3>
254/262 Test #254: code_api|api.static_detach .......................................***Failed  Required regular expression not found.Regex=[^pre-DR init

So either vdso is in the maps file twice, or find_executable_vm_areas is called twice. Both are odd. I'm disabling the assert temporarily while I try to reproduce this or investigate further using pull requests.

derekbruening commented 7 years ago

I can repro in a 14.04.5 VM (but not in 15.04 or on Fedora). The vdso pages are split into two entries, presumably by something DR did to them (vsyscall hook I suppose):

f7740000-f7741000 r-xp 00000000 00:00 0                                  [vdso]
f7741000-f7742000 r-xp 00000000 00:00 0                                  [vdso]
derekbruening commented 6 years ago

Xref #3065