DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.64k stars 560 forks source link

add application callstack walking extension #2414

Open illera88 opened 7 years ago

illera88 commented 7 years ago

Hi, There is no way (I know of) to get the stack trace when a exception is raised. This would be very helpful to collect information about an exception or crash.

Can this be done from within DynamoRio? I can help implementing it. Can I get pointed to where it should be implemented?

Cheers

derekbruening commented 7 years ago

Please clarify:

Xref https://github.com/DynamoRIO/drmemory/issues/823

illera88 commented 7 years ago
derekbruening commented 7 years ago

Getting a callstack in arbitrary code can be quite challenging: just look at all of the many issues in the Dr. Memory tracker on problems and challenges getting callstacks on Windows. Linux is generally simpler, and the more code you control compilation of the easier it is, so it is easier for client code than general app code (though clients can easily call off into Windows private libraries).

On a crash in DR we do not know what state anything is in and can't reliably run much code at all. A client crash is a little safer but it could still have corrupted state needed to do things like try to match what a debugger does, which on Windows would involve calling into dbghelp.dll.

The current simple DR frame pointer walk could be post-processed: Xref #973 Xref #1470

Clients could be given a control point and could from there use an extension or sthg: Xref #50 Xref DynamoRIO/drmemory#823

illera88 commented 7 years ago

First of all thank you for your time. Second, I didn't understand your first question regarding where the crash happens. My bad. I want to get the stack trace of a crash in the instrumented application. I said in the client and I meant in the instrumented binary.

In the client I'm using drmgr_register_exception_event or drmgr_register_signal_event (windows and linux) and once the exception callback is called I want to get the stack trace from it (from the target binary, not from the DR client).

Sorry for the confusion. Cheers

derekbruening commented 7 years ago

Your earlier users list emails were also talking about a report on a DR or tool crash, so this is rather confusing. If you're looking for an app callstack, I don't see that it has much relationship to an exception or fault. This just sounds like a request for a general app callstack feature and is unrelated to whether an app exception just happened.

derekbruening commented 7 years ago

This is a duplicate of https://github.com/DynamoRIO/drmemory/issues/823 unless we want to try harder for a BSD licensed extension. We can leave it in this tracker so it will show up in searches here.

illera88 commented 7 years ago

It is related since I get the exception right before the app is gonna crash. I would like to log the call stack from DR.

derekbruening commented 7 years ago

I don't see any relation to there being an exception. A general app callstack feature will acquire a callstack from whatever app state you point it at. It does not matter if there was an app exception or not unless you're going to wait until the app is inside its own signal handler or sthg and then you're just talking about the callstack walker figuring out how to get to the prior frame from there.

illera88 commented 7 years ago

When I say an exception I mean a SIGSEGV or EXCEPTION_ACCESS_VIOLATION. I don't care about the stack trace at X point in the code. I'm interested on the stack trace when the app crashes to know where is the bug coming from.

I don't see how this can surprise you. I don't know if that's difficult to implement. That's actually what I'm asking

derekbruening commented 3 years ago

Xref DynamoRIO/drmemory/issues/2399 and DynamoRIO/drmemory/issues/1222

derekbruening commented 3 years ago

Pasting from https://groups.google.com/g/dynamorio-users/c/iEgvOm2aGyk:

For unix, if the private copy of libunwind will operate nicely in an isolated way, maybe the simplest thing is to add support to construct the machine context of the app in the format libunwind wants, if all that's needed is the callstack walk and not callstack storage or persistence or compression (or that could be added later: either from scratch to stay BSD, or leveraging Dr. Memory's code).

But xref concerns about performance: though that should be re-measured.

derekbruening commented 3 years ago

libunwind uses dl_iterate_phdr to walk the libraries. DR's private loader replaces that with a walk of private libraries, while for an app callstack we want app libraries. By nature of its use this should handle interrupting arbitrary app code, so we could conceivably route the libunwind import to the app version, but it seems safer to have DR code doing the walk using DR's list of app libraries. It's not clear how to tell DR to not isolate that particular import, however.

derekbruening commented 3 years ago

The current plan is to rely on libunwind.so being installed on all user machines. The dl_iterate_phdr redirection to app libraries relies on a shared libunwind.so (the provided libunwind.a is not PIC in any case).

However, we're hitting issues with no libunwind packages available for cross-compiling. After discussion, we're sticking with using the system libunwind.so, and will just build or specially obtain packages for cross-compiling. But there are advantages to building our own static PIC libunwind libraries (either as git submodule or we build for all arches and check in the binaries like we do for drsyms libelftc):

derekbruening commented 3 years ago

Action item: the callstack sample triggers a (vague) libunwind error on a malloc callstack in ld.so for 32-bit x86:

216: malloc called from:
216:   ld-linux.so.2!<unknown>
216: libunwind raw error -1
216: res=2
216: ASSERT FAILURE: /home/bruening/dr/git/src/api/samples/callstack.cpp:107: res == DRCALLSTACK_NO_MORE_FRAMES ()

It may be worth investigating why libunwind fails here and whether it possibly has anything to do with our setup or it's an internal libunwind problem or a ld.so unwind info quality problem.

derekbruening commented 3 years ago

Action item: the callstack sample triggers a (vague) libunwind error on a malloc callstack in ld.so for 32-bit x86

Using a debug build of libunwind we have:

bruening@ubuntu:~/dr/git/build_x86_dbg_tests$ UNW_DEBUG_LEVEL=25 LD_LIBRARY_PATH=/home/bruening/libunwind-debug bin32/drrun -c api/bin/libcallstack.so -- suite/tests/bin/common.eflags

malloc called from:
 >unw_init_local_common: (cursor=0x4b725588)
                >access_mem: mem[4b725468] -> f7b18240
                >access_mem: mem[4b72544c] -> fff7fcec
  ld-linux.so.2!<unknown>
 >_ULx86_step: (cursor=0x4b725588, ip=0xf7b18240)
                >get_rs_cache: acquiring lock
              >_ULx86_dwarf_find_proc_info: looking for IP=0xf7b1823f
               >_ULx86_dwarf_callback: checking /home/bruening/dr/git/build_x86_dbg_tests/suite/tests/bin/common.eflags, base=0xf3b24000)
               >_ULx86_dwarf_callback: checking /home/bruening/dr/git/build_x86_dbg_tests/api/bin/libcallstack.so, base=0xf7af3000)
               >_ULx86_dwarf_callback: checking /usr/lib/i386-linux-gnu/ld-2.31.so, base=0xf7afd000)
               >_ULx86_dwarf_callback: found table `/usr/lib/i386-linux-gnu/ld-2.31.so': segbase=0xf7b20f5c, len=502, gp=0xf7b29000, table_data=0xf7b20f68
      >_ULx86_dwarf_search_unwind_table: lookup IP 0xffff72e3
               >lookup: e->start_ip_offset = ffff3e24
               >lookup: e->start_ip_offset = ffff8a64
               >lookup: e->start_ip_offset = ffff73e4
               >lookup: e->start_ip_offset = ffff4c4d
               >lookup: e->start_ip_offset = ffff6114
               >lookup: e->start_ip_offset = ffff7004
               >lookup: e->start_ip_offset = ffff7254
               >lookup: e->start_ip_offset = ffff72e4
               >_ULx86_dwarf_search_unwind_table: ip=0xf7b1823f, load_offset=0x0, start_ip=0xffff7254
 >_ULx86_dwarf_search_unwind_table: e->fde_offset = 41b8, segbase = f7b20f5c, debug_frame_base = 0, fde_addr = f7b25114
            >_ULx86_dwarf_extract_proc_info_from_fde: FDE @ 0xf7b25114
               >_ULx86_dwarf_extract_proc_info_from_fde: looking for CIE at address f7b21740
               >parse_cie: CIE parsed OK, augmentation = "zR", handler=0x0
               >_ULx86_dwarf_extract_proc_info_from_fde: FDE covers IP 0xf7b181b0-0xf7b1823f, LSDA=0x0
                >put_rs_cache: unmasking signals/interrupts and releasing lock
             >_ULx86_step: dwarf_step() failed (ret=-10), trying frame-chain
                >access_mem: mem[f7b18240] -> fb1e0ff3
                >access_mem: mem[f7b18244] -> 8b535657
                >_ULx86_is_signal_frame: returning 0
                >access_mem: mem[4b725448] -> 10
             >_ULx86_step: [EBP=0x4b725448] = 0x10
             >_ULx86_step: dwarf_get([EIP=0x14]) failed
  >_ULx86_step: returning -1

So it's:

             >_ULx86_step: dwarf_step() failed (ret=-10), trying frame-chain

-10 is indeed UNW_ENOINFO. So unless we got the PC wrong somehow, it's just the compiler not emitting unwind info for this spot??