DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.66k stars 562 forks source link

Problems with function wrapping when multiple addresses for the same name exist #7004

Open ShangzhiXu opened 1 month ago

ShangzhiXu commented 1 month ago

Describe the bug Hi there! Firstly,thanks for your great work. But recently I tried to use dynamorio to wrap functions, like this:

static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded) {
    dr_printf("Module loaded: %s\n", mod->full_path);
    dr_printf("total functions: %d\n", num_functions);
    for (int i = 0; i < num_functions; i++) {
        app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, function_names[i]);
        if (towrap != NULL) {
            dr_printf("Wrapping function: %s at address %p\n", function_names[i], towrap);
            drwrap_wrap_ex(towrap, generic_wrap_pre, generic_wrap_post, (void *)function_names[i], 0);
        } else {
            dr_printf("Failed to locate function: %s in module %s\n", function_names[i], mod->full_path);
        }
  }
}

the output is:

Module loaded: /usr/lib/x86_64-linux-gnu/libc.so.6
total functions: 3
Wrapping function: memcpy at address 0x00007f0dfca0ecb0
Wrapping function: printf at address 0x00007f0dfc9be5b0
Wrapping function: strcpy at address 0x00007f0dfcac1b70

the outputs shows that dynamorio can wrap functions like 'memcpy' 'strcpy' and 'printf' but the problem is that when I do

static void generic_wrap_pre(void *wrapcxt, void **user_data) {
    const char *func_name = (const char *)*user_data;
    dr_printf("Function %s is called\n", func_name);  // Added logging
    call_stack.push(func_name);
}

and when 'memcpy' 'strcpy' and 'printf' are called in target binary, only printf can be traced, other two can't be traced.. To Reproduce Steps to reproduce the behavior: Like above

my command is: ../DynamoRIO-Linux-10.0.19811/bin64/drrun -c client_printGraph.so -- test/buffer_overflow

Versions

derekbruening commented 1 month ago

Please clarify "can't be traced". Do you observe the application's code reaching the memcpy libc entry address? Are you sure it's not just that all cases of memcpy in the application's own code aren't inlined and control never reaches libc memcpy? Debug build DR logs can be used to see all addresses encountered: https://dynamorio.org/page_logging.html

ShangzhiXu commented 1 month ago

Thank you so much for your response! I believe I've identified the issue. so firstly, the "can't be traced" means memcpy was called in my target program but DynamoRIO failed to record the call.

The cause is this: In my glibc, there are actually two memcpy, with different version

# readelf -s /usr/lib/x86_64-linux-gnu/libc.so.6 | grep memcpy
   497: 00000000000b1270     9 FUNC    WEAK   DEFAULT   16 wmemcpy@@GLIBC_2.2.5
  2724: 00000000000a2cb0    40 FUNC    GLOBAL DEFAULT   16 memcpy@GLIBC_2.2.5
  2726: 000000000009bdb0   265 IFUNC   GLOBAL DEFAULT   16 memcpy@@GLIBC_2.14

And in my target program, memcpy@@GLIBC_2.14 was linked by default, the PLT looks like this:

0000000000001040 <memcpy@plt>:
    1040:       ff 25 c2 2f 00 00       jmp    *0x2fc2(%rip)        # 4008 <memcpy@GLIBC_2.14>
    1046:       68 01 00 00 00          push   $0x1
    104b:       e9 d0 ff ff ff          jmp    1020 <_init+0x20>

But by default, if I use drwarp like this

        app_pc towrap = (app_pc)dr_get_proc_address(mod->handle, function_names[i]);
        if (towrap != NULL) {
            drwrap_wrap_ex(towrap, generic_wrap_pre, generic_wrap_post, (void *)function_names[i], 0);
        }

it will wrap memcpy@GLIBC_2.2.5.

To resolve this, I created a custom shared library (override_memcpy.c) to force memcpy@GLIBC_2.2.5 using LD_PRELOAD. After doing this, DynamoRIO successfully reported the memcpy calls.

#define _GNU_SOURCE
#include <string.h>
#include <stdio.h>
#include <dlfcn.h>

void *memcpy(void *dest, const void *src, size_t n) {
    // Use `dlvsym` to find `memcpy` with the specific version `GLIBC_2.2.5`
    static void *(*original_memcpy)(void *, const void *, size_t) = NULL;

    if (!original_memcpy) {
        // Look up the `memcpy` symbol with version `GLIBC_2.2.5`
        original_memcpy = dlvsym(RTLD_NEXT, "memcpy", "GLIBC_2.2.5");
        if (!original_memcpy) {
            fprintf(stderr, "Failed to find memcpy@GLIBC_2.2.5\n");
            return NULL;
        }
    }
    return original_memcpy(dest, src, n);
}

and use LD_PRELOAD=./override_memcpy.so to forcely let my target program to load memcpy@GLIBC_2.2.5. After that, DynamoRIO was able to trace memcpy correctly.

derekbruening commented 1 month ago

It sounds like you want to use drsym_enumerate_symbols_ex() to walk all symbols and find all memcpy copies; or possibly have drsym_lookup_symbol() or dr_get_proc_address() support iteration instead of returning just one.

If you try drsym_enumerate_symbols_ex() and it works, could you submit a PR to improve the drwrap and drsym_lookup_symbol()/dr_get_proc_address() docs so that others will be aware of the possibility of multiple symbols?

ShangzhiXu commented 1 month ago

Thanks! I think I made it with drsym_enumerate_symbols_ex() Now in my target program, the plt is still like

0000000000001060 <memcpy@plt>:
    1060:       ff 25 b2 2f 00 00       jmp    *0x2fb2(%rip)        # 4018 <memcpy@GLIBC_2.14>
    1066:       68 03 00 00 00          push   $0x3
    106b:       e9 b0 ff ff ff          jmp    1020 <_init+0x20>

And I tried to use drsym_enumerate_symbols_ex() like this:

static bool symbol_filter(drsym_info_t *info, drsym_error_t status, void *data) {
    if (strcmp(info->name, "memcpy") == 0) {
        app_pc start = (app_pc)data; // Assuming data is the start address of the module
        app_pc func_pc = start + info->start_offs; // Correct pointer arithmetic
        // Wrap
        drwrap_wrap_ex(func_pc, generic_wrap_pre, generic_wrap_post, (void *)"memcpy", 0);
       dr_printf("Wrapped function: %s at address: %p\n", info->name, func_pc);
    }
    return true;
}

/* Event called when module is loaded */
static void module_load_event(void *drcontext, const module_data_t *mod, bool loaded) {
    if (loaded) {
        drsym_error_t sym_result;
        sym_result = drsym_enumerate_symbols_ex(mod->full_path, symbol_filter, sizeof(drsym_info_t), (void*)mod->start, DRSYM_DEMANGLE);
        if (sym_result != DRSYM_SUCCESS) {
            dr_printf("Failed to enumerate symbols for module %s\n", mod->full_path);
        }
    }
}

In the output, I found out that three different memcpy are wrapped

Wrapped function: memcpy at address: 0x00007f5cbc5ec560
Wrapped function: memcpy at address: 0x00007f5cbbe0bdb0

And in Dynamorio debug info, I do found 0x00007f84f2865399 e8 c2 b1 01 00 call $0x00007f84f2880560 %rsp which means the memcpy been called is wrapped successfully.

I'll try my best to submit a PR to enhance the drwrap functionality